In the era of climate change, continuous monitoring of greenhouse gas (GHG) exchanges between terrestrial ecosystems and the atmosphere is crucial. A widely used techniques for large-scale, long-term monitoring is the Eddy-Covariance (EC) method, which produces half-hourly time series of surface fluxes for the main GHGs such as water vapor (H₂O), carbon dioxide (CO₂), methane (CH₄), and nitrous oxide (N₂O), along with various micro-meteorological variables. Despite significant advancements in sensor technology and data acquisition systems, EC time series frequently present substantial data gaps (20-60% annually) due to sensor limitations, adverse conditions, and strict quality control. These gaps hinder critical analyses such as the estimation of annual GHG budgets. Multiple imputation (MI) offers a principles approach for handling missing data allowing valid statistical inference. However, most MI algorithms are not designed to accomodate the characteristics of high-frequency, multivariate time series such as those produced by EC systems, and their performance in this context remains under-explored. To address this gap, this study evaluates the performance of three MI methods tailored for EC datasets. 1) EMB, an MI algorithm based on the Expectation-Maximization with Bootstrapping, (Honaker and King, 2010), which assumes multivariate normality of the data. It integrates time series features like polynomial trends, lagged variables, and hydro-ecological regime segmentation, as detailed in Vitale et al. (2018), to meet model assumption. 2) RF, a random forest-based MI (Doove et al., 2014) implemented within a fully conditional specification framework, which captures nonlinearities and variable interactions without requiring strong distributional assumptions. 3) XGB, an MI method based on Extreme Gradient Boosting (Deng & Lumley, 2023), a scalable and efficient implementation of gradient boosting, particularly suited for high-dimensional, structured datasets such as EC time series.. Performance was assessed via Monte Carlo simulations using the FLUXNET2015 dataset (Pastorello et al, 2020). In addition to comparing algorithmic performance, the study reviews current imputation evaluation metrics, emphasizing the need for robust, context-specific criteria for environmental time series. The findings provide guidance for selecting MI strategies in EC data processing pipelines, with implications for improving the accuracy of ecosystem carbon balance estimates.

Addressing Missing Data in Eddy-Covariance Time Series: A Comparative Study of Multiple Imputation Strategies / Vitale, Domenico; Tancredi, Andrea. - (2025), pp. 131-131. (Intervento presentato al convegno GRASPA2025 tenutosi a Rome; Italy).

Addressing Missing Data in Eddy-Covariance Time Series: A Comparative Study of Multiple Imputation Strategies

Domenico Vitale;Andrea Tancredi
2025

Abstract

In the era of climate change, continuous monitoring of greenhouse gas (GHG) exchanges between terrestrial ecosystems and the atmosphere is crucial. A widely used techniques for large-scale, long-term monitoring is the Eddy-Covariance (EC) method, which produces half-hourly time series of surface fluxes for the main GHGs such as water vapor (H₂O), carbon dioxide (CO₂), methane (CH₄), and nitrous oxide (N₂O), along with various micro-meteorological variables. Despite significant advancements in sensor technology and data acquisition systems, EC time series frequently present substantial data gaps (20-60% annually) due to sensor limitations, adverse conditions, and strict quality control. These gaps hinder critical analyses such as the estimation of annual GHG budgets. Multiple imputation (MI) offers a principles approach for handling missing data allowing valid statistical inference. However, most MI algorithms are not designed to accomodate the characteristics of high-frequency, multivariate time series such as those produced by EC systems, and their performance in this context remains under-explored. To address this gap, this study evaluates the performance of three MI methods tailored for EC datasets. 1) EMB, an MI algorithm based on the Expectation-Maximization with Bootstrapping, (Honaker and King, 2010), which assumes multivariate normality of the data. It integrates time series features like polynomial trends, lagged variables, and hydro-ecological regime segmentation, as detailed in Vitale et al. (2018), to meet model assumption. 2) RF, a random forest-based MI (Doove et al., 2014) implemented within a fully conditional specification framework, which captures nonlinearities and variable interactions without requiring strong distributional assumptions. 3) XGB, an MI method based on Extreme Gradient Boosting (Deng & Lumley, 2023), a scalable and efficient implementation of gradient boosting, particularly suited for high-dimensional, structured datasets such as EC time series.. Performance was assessed via Monte Carlo simulations using the FLUXNET2015 dataset (Pastorello et al, 2020). In addition to comparing algorithmic performance, the study reviews current imputation evaluation metrics, emphasizing the need for robust, context-specific criteria for environmental time series. The findings provide guidance for selecting MI strategies in EC data processing pipelines, with implications for improving the accuracy of ecosystem carbon balance estimates.
2025
GRASPA2025
04 Pubblicazione in atti di convegno::04d Abstract in atti di convegno
Addressing Missing Data in Eddy-Covariance Time Series: A Comparative Study of Multiple Imputation Strategies / Vitale, Domenico; Tancredi, Andrea. - (2025), pp. 131-131. (Intervento presentato al convegno GRASPA2025 tenutosi a Rome; Italy).
File allegati a questo prodotto
File Dimensione Formato  
Vitale_Abstract-GRASPA_2025.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 617.28 kB
Formato Adobe PDF
617.28 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1745481
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact