Evolutionary Computation (EC) techniques have been introduced in the 1960s for dealing with complex situations. One possible example is an optimization problems not having an analytical solution or being computationally intractable; in many cases such methods, named Evolutionary Algorithms (EAs), have been successfully implemented. In statistics there are many situations where complex problems arise, in particular concerning optimization. A general example is when the statistician needs to select, inside a prohibitively large discrete set, just one element, which could be a model, a partition, an experiment, or such: this would be the case of model selection, cluster analysis or design of experiment. In other situations there could be an intractable function of data, such as a likelihood, which needs to be maximized, as it happens in model parameter estimation. These kind of problems are naturally well suited for EAs, and in the last 20 years a large number of papers has been concerned with applications of EAs in tackling statistical issues. The present dissertation is set in this part of literature, as it reports several implementations of EAs in statistics, although being mainly focused on statistical inference problems. Original results are proposed, as well as overviews and surveys on several topics. EAs are employed and analyzed considering various statistical points of view, showing and confirming their efficiency and flexibility. The first proposal is devoted to parametric estimation problems. When EAs are employed in such analysis a novel form of variability related to their stochastic elements is introduced. We shall analyze both variability due to sampling, associated with selected estimator, and variability due to the EA. This analysis is set in a framework of statistical and computational tradeoff question, crucial in nowadays problems, by introducing cost functions related to both data acquisition and EA iterations. The proposed method will be illustrated by means of model building problem examples. Subsequent chapter is concerned with EAs employed in Markov Chain Monte Carlo (MCMC) sampling. When sampling from multimodal or highly correlated distribution is concerned, in fact, a possible strategy suggests to run several chains in parallel, in order to improve their mixing. If these chains are allowed to interact with each other then many analogies with EC techniques can be observed, and this has led to research in many fields. The chapter aims at reviewing various methods found in literature which conjugates EC techniques and MCMC sampling, in order to identify specific and common procedures, and unifying them in a framework of EC. In the last proposal we present a complex time series model and an identification procedure based on Genetic Algorithms (GAs). The model is capable of dealing with seasonality, by Periodic AutoRegressive (PAR) modelling, and structural changes in time, leading to a nonstationary structure. As far as a very large number of parameters and possibilites of change points are concerned, GAs are appropriate for identifying such model. Effectiveness of procedure is shown on both simulated data and real examples, these latter referred to river flow data in hydrology. The thesis concludes with some final remarks, concerning also future work.

Contributions on evolutionary computation for statistical inference / Rizzo, Manuel. - (2018 Feb 26).

### Contributions on evolutionary computation for statistical inference

#### Abstract

Evolutionary Computation (EC) techniques have been introduced in the 1960s for dealing with complex situations. One possible example is an optimization problems not having an analytical solution or being computationally intractable; in many cases such methods, named Evolutionary Algorithms (EAs), have been successfully implemented. In statistics there are many situations where complex problems arise, in particular concerning optimization. A general example is when the statistician needs to select, inside a prohibitively large discrete set, just one element, which could be a model, a partition, an experiment, or such: this would be the case of model selection, cluster analysis or design of experiment. In other situations there could be an intractable function of data, such as a likelihood, which needs to be maximized, as it happens in model parameter estimation. These kind of problems are naturally well suited for EAs, and in the last 20 years a large number of papers has been concerned with applications of EAs in tackling statistical issues. The present dissertation is set in this part of literature, as it reports several implementations of EAs in statistics, although being mainly focused on statistical inference problems. Original results are proposed, as well as overviews and surveys on several topics. EAs are employed and analyzed considering various statistical points of view, showing and confirming their efficiency and flexibility. The first proposal is devoted to parametric estimation problems. When EAs are employed in such analysis a novel form of variability related to their stochastic elements is introduced. We shall analyze both variability due to sampling, associated with selected estimator, and variability due to the EA. This analysis is set in a framework of statistical and computational tradeoff question, crucial in nowadays problems, by introducing cost functions related to both data acquisition and EA iterations. The proposed method will be illustrated by means of model building problem examples. Subsequent chapter is concerned with EAs employed in Markov Chain Monte Carlo (MCMC) sampling. When sampling from multimodal or highly correlated distribution is concerned, in fact, a possible strategy suggests to run several chains in parallel, in order to improve their mixing. If these chains are allowed to interact with each other then many analogies with EC techniques can be observed, and this has led to research in many fields. The chapter aims at reviewing various methods found in literature which conjugates EC techniques and MCMC sampling, in order to identify specific and common procedures, and unifying them in a framework of EC. In the last proposal we present a complex time series model and an identification procedure based on Genetic Algorithms (GAs). The model is capable of dealing with seasonality, by Periodic AutoRegressive (PAR) modelling, and structural changes in time, leading to a nonstationary structure. As far as a very large number of parameters and possibilites of change points are concerned, GAs are appropriate for identifying such model. Effectiveness of procedure is shown on both simulated data and real examples, these latter referred to river flow data in hydrology. The thesis concludes with some final remarks, concerning also future work.
##### Scheda breve Scheda completa
26-feb-2018
File allegati a questo prodotto
File
Tesi dottorato Rizzo

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 1.75 MB
Utilizza questo identificativo per citare o creare un link a questo documento: `https://hdl.handle.net/11573/1080410`