In environmental statistics and epidemiology applications, most literature focuses on regression models for the conditional mean of a distribution. However, there is growing interest in extreme exposure events, such as those related to heat waves or air pollution peaks. Providing new exposure metrics with a focus on extreme episodes will allow epidemiologists to evaluate the related health effects, and policymakers to implement intervention and awareness strategies, especially for the most vulnerable segments of the population. This study aims to model extreme concentrations of nitrogen dioxide in the Lazio Region between 2017 and 2022 using quantile regression. We modeled daily concentrations of NO2 from 51 monitoring stations provided by ARPA Lazio. The limited time period was chosen to reduce missing values (3.65%). The original values underwent square root transformation and were interpolated for missing data using univariate spline-based methods, also taking into account the proximity of the stations. Therefore we introduced external covariates available at a 1 km × 1 km resolution, drawn from major satellite datasets (e.g. Copernicus ERA5), downscaling both spatio-temporal variables (temperature, precipitation, planetary boundary layer height) and purely spatial ones designed to capture human presence (resident population, nighttime lights, etc.). Our proposal can be divided into two steps. Step 1. In the first step, local Bayesian models were fitted to three quantile levels (0.5, 0.75, and 0.90) to capture the temporal dynamics of each station across quantiles. Model selection was performed using time-based 10-fold cross-validation at the station level, with R1 as the primary evaluation metric. Step 2. The estimates from step 1 were then used as response variables in a second phase involving Generalized Additive Models. At this stage, we incorporated the effects of covariates. The objective of this step was to spatialize the results within a hybrid Bayesian semiparametric framework. The models underwent model selection and were subsequently evaluated using Leave-One-Station-Out Cross-Validation (LOSOCV) along with additional spatial and temporal control regressions. The final outputs can be utilized to generate spatio-temporal extreme exposure surfaces for each quantile, which can subsequently be connected to different health outcomes in order to explore acute effects from extreme environmental exposures.
Evaluating extreme NO2 exposures in the Lazio region: a hybrid bayesian semi-parametric quantile regression approach / Rosci, Edoardo; Jona-Lasinio, Giovanna; Michelozzi, Paola; Stafoggia, Massimo. - (2025), pp. 123-123. (Intervento presentato al convegno GRASPA 2025 tenutosi a Rome; Italy).
Evaluating extreme NO2 exposures in the Lazio region: a hybrid bayesian semi-parametric quantile regression approach
Edoardo Rosci;Giovanna Jona-Lasinio;
2025
Abstract
In environmental statistics and epidemiology applications, most literature focuses on regression models for the conditional mean of a distribution. However, there is growing interest in extreme exposure events, such as those related to heat waves or air pollution peaks. Providing new exposure metrics with a focus on extreme episodes will allow epidemiologists to evaluate the related health effects, and policymakers to implement intervention and awareness strategies, especially for the most vulnerable segments of the population. This study aims to model extreme concentrations of nitrogen dioxide in the Lazio Region between 2017 and 2022 using quantile regression. We modeled daily concentrations of NO2 from 51 monitoring stations provided by ARPA Lazio. The limited time period was chosen to reduce missing values (3.65%). The original values underwent square root transformation and were interpolated for missing data using univariate spline-based methods, also taking into account the proximity of the stations. Therefore we introduced external covariates available at a 1 km × 1 km resolution, drawn from major satellite datasets (e.g. Copernicus ERA5), downscaling both spatio-temporal variables (temperature, precipitation, planetary boundary layer height) and purely spatial ones designed to capture human presence (resident population, nighttime lights, etc.). Our proposal can be divided into two steps. Step 1. In the first step, local Bayesian models were fitted to three quantile levels (0.5, 0.75, and 0.90) to capture the temporal dynamics of each station across quantiles. Model selection was performed using time-based 10-fold cross-validation at the station level, with R1 as the primary evaluation metric. Step 2. The estimates from step 1 were then used as response variables in a second phase involving Generalized Additive Models. At this stage, we incorporated the effects of covariates. The objective of this step was to spatialize the results within a hybrid Bayesian semiparametric framework. The models underwent model selection and were subsequently evaluated using Leave-One-Station-Out Cross-Validation (LOSOCV) along with additional spatial and temporal control regressions. The final outputs can be utilized to generate spatio-temporal extreme exposure surfaces for each quantile, which can subsequently be connected to different health outcomes in order to explore acute effects from extreme environmental exposures.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


