In the era of climate change, continuous monitoring of greenhouse gas (GHG) exchanges between terrestrial ecosystems and the atmosphere is crucial. A widely used techniques for large-scale, long-term monitoring is the Eddy-Covariance (EC) method, which produces half-hourly time series of surface fluxes for the main GHGs such as water vapor (H₂O), carbon dioxide (CO₂), methane (CH₄), and nitrous oxide (N₂O), along with various micro-meteorological variables. Despite significant advancements in sensor technology and data acquisition systems, EC time series frequently present substantial data gaps (20-60% annually) due to sensor limitations, adverse conditions, and strict quality control. These gaps hinder critical analyses such as the estimation of annual GHG budgets. Multiple imputation (MI) offers a principles approach for handling missing data allowing valid statistical inference. However, most MI algorithms are not designed to accomodate the characteristics of high-frequency, multivariate time series such as those produced by EC systems, and their performance in this context remains under-explored. To address this gap, this study evaluates the performance of three MI methods tailored for EC datasets. 1) EMB, an MI algorithm based on the Expectation-Maximization with Bootstrapping, (Honaker and King, 2010), which assumes multivariate normality of the data. It integrates time series features like polynomial trends, lagged variables, and hydro-ecological regime segmentation, as detailed in Vitale et al. (2018), to meet model assumption. 2) RF, a random forest-based MI (Doove et al., 2014) implemented within a fully conditional specification framework, which captures nonlinearities and variable interactions without requiring strong distributional assumptions. 3) XGB, an MI method based on Extreme Gradient Boosting (Deng & Lumley, 2023), a scalable and efficient implementation of gradient boosting, particularly suited for high-dimensional, structured datasets such as EC time series.. Performance was assessed via Monte Carlo simulations using the FLUXNET2015 dataset (Pastorello et al, 2020). In addition to comparing algorithmic performance, the study reviews current imputation evaluation metrics, emphasizing the need for robust, context-specific criteria for environmental time series. The findings provide guidance for selecting MI strategies in EC data processing pipelines, with implications for improving the accuracy of ecosystem carbon balance estimates.
Addressing Missing Data in Eddy-Covariance Time Series: A Comparative Study of Multiple Imputation Strategies / Vitale, Domenico; Tancredi, Andrea. - (2025), pp. 131-131. (Intervento presentato al convegno GRASPA2025 tenutosi a Rome; Italy).
Addressing Missing Data in Eddy-Covariance Time Series: A Comparative Study of Multiple Imputation Strategies
Domenico Vitale;Andrea Tancredi
2025
Abstract
In the era of climate change, continuous monitoring of greenhouse gas (GHG) exchanges between terrestrial ecosystems and the atmosphere is crucial. A widely used techniques for large-scale, long-term monitoring is the Eddy-Covariance (EC) method, which produces half-hourly time series of surface fluxes for the main GHGs such as water vapor (H₂O), carbon dioxide (CO₂), methane (CH₄), and nitrous oxide (N₂O), along with various micro-meteorological variables. Despite significant advancements in sensor technology and data acquisition systems, EC time series frequently present substantial data gaps (20-60% annually) due to sensor limitations, adverse conditions, and strict quality control. These gaps hinder critical analyses such as the estimation of annual GHG budgets. Multiple imputation (MI) offers a principles approach for handling missing data allowing valid statistical inference. However, most MI algorithms are not designed to accomodate the characteristics of high-frequency, multivariate time series such as those produced by EC systems, and their performance in this context remains under-explored. To address this gap, this study evaluates the performance of three MI methods tailored for EC datasets. 1) EMB, an MI algorithm based on the Expectation-Maximization with Bootstrapping, (Honaker and King, 2010), which assumes multivariate normality of the data. It integrates time series features like polynomial trends, lagged variables, and hydro-ecological regime segmentation, as detailed in Vitale et al. (2018), to meet model assumption. 2) RF, a random forest-based MI (Doove et al., 2014) implemented within a fully conditional specification framework, which captures nonlinearities and variable interactions without requiring strong distributional assumptions. 3) XGB, an MI method based on Extreme Gradient Boosting (Deng & Lumley, 2023), a scalable and efficient implementation of gradient boosting, particularly suited for high-dimensional, structured datasets such as EC time series.. Performance was assessed via Monte Carlo simulations using the FLUXNET2015 dataset (Pastorello et al, 2020). In addition to comparing algorithmic performance, the study reviews current imputation evaluation metrics, emphasizing the need for robust, context-specific criteria for environmental time series. The findings provide guidance for selecting MI strategies in EC data processing pipelines, with implications for improving the accuracy of ecosystem carbon balance estimates.| File | Dimensione | Formato | |
|---|---|---|---|
|
Vitale_Abstract-GRASPA_2025.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
617.28 kB
Formato
Adobe PDF
|
617.28 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


