Innovative approaches in spatio-temporal modeling: handling data collected by new technologies

ALAIMO DI LORO, Pierfrancesco

This thesis illustrates and puts in context two of the main research projects I worked on during my Ph.D. program, in collaboration with several national and international co-authors from "La Sapienza" and other prestigious universities. Both research lines concern spatial and spatio-temporal analysis of geo-referenced datasets, which is of broad and current interest in the statistical research literature and applications. My focus on such an area of statistics was not meditated before the start of the program. However, while pursuing my original research interests in the broader domain of Bayesian statistics, I realized there was an ever-increasing demand for viable and efficient statistical methods to analyze spatial and spatio-temporal data. That is a consequence of the extraordinary technological development that interested data collection systems during the last few decades. The innovative, cutting-edge technologies conceive new devices that can record and store data and information about the most diverse phenomena, possibly at a fine spatial scale and with high temporal resolution. Such capabilities were just a dream up to 20 or 30 years ago. Spatial statistics methods are rapidly evolving to face this surge of novel data structure in various application fields: geology, meteorology, ecology, epidemiology, economics, politics, and more. The first chapter of this thesis introduces the general idea behind spatial statistics, that is the branch of statistics devoted to analyzing and modeling temporal and spatial structure in time and/or geo-referenced datasets. A brief historical introduction of its developments is provided, starting from the first (sometimes unwitting) applications of its logic to practical and theoretical problems at the end of the XIX century. Many methods and techniques in this domain evolved independently, driven by the specific needs of the application fields in which they were developed. The historical excursus leads to a coarse (but reasonable) distinction in three main areas: continuous spatial variations, discrete spatial variations, spatial point patterns. These areas present further facets within themselves, making spatial statistics an incredibly diverse and rich topic. A really comprehensive review would require an entire book to be written and maybe a lifetime to be thoroughly studied. Therefore, in the following Chapters, the discussion is focused on specific areas and techniques used in the studies. Only those tools that proved valuable for the analysis performed in Alaimo Di Loro et al.(2021) and Kalair et al. (2020) are extensively treated. The second chapter focuses on analyzing continuous spatial variation, which is the modeling of outcomes varying continuously over some space. First, the most relevant properties for continuous spatial processes are introduced; second, some of the most common methodologies for performing spatial interpolation of the mean trend and stochastic modeling of the residuals are listed and sketched. In particular, the chapter digresses on Spline Regression as a valid technique to catch the first-order structure in spatial data. Soon after, the Geo-Statistical methods and the Bayesian Hierarchical framework are claimed as invaluable tools to attain the simultaneous estimation of the first and second-order structure of a process. Extension to spatio-temporal contexts is not as trivial as it may seem but must be approached with due care. An extensive discussion about the possible pitfalls and viable solutions is included in the same chapter. Finally, the problems arising in the analysis of Big spatial data are highlighted in the last section, where The Nearest Neighbor Gaussian Process (NNGP, Datta et al. (2016a,b)) model is introduced as a highly scalable framework for providing full inference on massive spatial and spatio-temporal datasets. The third chapter includes an extended version of the paper Alaimo Di Loro et al. (2021), currently under-review and published as a pre-print. It describes how the aforementioned technological development has strongly affected human tracking and monitoring capabilities, generating substantial interest in monitoring human activity. New non-intrusive wearable devices, such as wrist-worn sensors that monitor gross motor activity (miniature accelerometers), can continuously record individual activity levels, producing massive amounts of high-resolution measurements. Analyzing such data needs to account for spatial and temporal information on trajectories or paths traversed by subjects wearing such devices. Inferential objectives include estimating a subject’s physical activity levels along a given trajectory, identifying trajectories that are more likely to produce higher levels of physical activity for a given subject, and predicting expected levels of physical activity in any proposed new trajectory for a given set of health attributes. We argue that the underlying process is more appropriately modeled as a stochastic evolution through time while accounting for spatial information separately. Building upon recent developments in this field, we construct temporal processes using directed acyclic graphs (DAG) on the line of the NNGP, include spatial dependence through penalized spline regression, and develop optimized implementations of the collapsed Markov chain Monte Carlo (MCMC) algorithm. The resulting Bayesian hierarchical modeling framework for the analysis of spatial-temporal actigraphy data proves able to deliver fully model-based inference on trajectories while accounting for subject-level health attributes and spatial-temporal dependencies. We undertake a comprehensive analysis of an original dataset from the Physical Activity through Sustainable Transport Approaches in Los Angeles (PASTA-LA) study to formally ascertain spatial zones and trajectories exhibiting significantly higher physical activity levels. Suggestions for further extensions and improvements on the currently adopted methodology are discussed in the last section of the chapter. Chapter four undergoes a paradigm shift and introduces the basic theory and tools of spatial point patterns analysis. Some common probabilistic models for point processes are briefly discussed, with some of their properties and limitations highlighted. The rest of the chapter is instead entirely focused on the Hawkes process and its spatio-temporal extension. It is a particular kind of self-exciting point process that presents a strong inter-dependence structure. While conceived in Hawkes (1971a), its use in the statistical application has been for a long time limited to the analysis of earthquakes dynamic. The recent escalation of data at the high temporal resolution, sometimes accompanied by spatial information, has favored its use in modeling events dynamics in diverse fields: finance, society, biology, etc. In particular, its defining properties are presented and state-of-the-art estimation methods of the spatio-temporal version are introduced. In the fifth chapter, the semi-parametric Hawkes process with a periodic background originally introduced in Zhuang and Mateu (2019) is outlined. While very recent, it has already revealed itself very useful to model phenomena that are likely to present a cyclic pattern. It assumes that primary events occur as an effect of the background intensity, while secondary events are associated with the self-excitation effect. There are sound motivations that justify its utilization in the context of road accident dynamics, e.g.: excitation may occur when a driver, reacting to the disruption of one accident, triggers a subsequent accident upstream of the first one. The proposed framework is tested on two original applications on two original sets of data: the first one, somewhat preliminary, involves the modeling and analysis of road accidents that occurred on the urban road network of Rome, in Italy; the second is instead a conclusive analysis recently published in (Kalair et al., 2020), conducted on a collection of road accidents occurred on the M25 London Orbital, in the United Kingdom. Adaptations of the original methodology to the road accident setting were deemed necessary in both cases to consider specific features of car accidents and the geometry of the underlying space. The final results permit a fruitful interpretation of the temporal and spatial background that detects the typical commuting behavior in the Roman and Londoners communities. The self-excitation component appears to have slightly different intensities in the two contexts, suggesting excitation mechanisms that vary between urban networks and motorways. Finally, the sixth chapter summarizes all the main passages in the thesis, highlighting the previous chapters’ original contributions. It also tries to summarize a take-home message about statistical modeling’s fundamental importance as a scientific tool to formulate and verify hypotheses that must not be discouraged by new challenges and technological advancements.

Innovative approaches in spatio-temporal modeling: handling data collected by new technologies / ALAIMO DI LORO, Pierfrancesco. - (2021 Jul 13).