Catalogo dei prodotti della ricerca

In the era of climate change, continuous monitoring of greenhouse gas (GHG) exchanges between terrestrial ecosystems and the atmosphere is crucial. A widely used techniques for large-scale, long-term monitoring is the Eddy-Covariance (EC) method, which produces half-hourly time series of surface fluxes for the main GHGs such as water vapor (H₂O), carbon dioxide (CO₂), methane (CH₄), and nitrous oxide (N₂O), along with various micro-meteorological variables. Despite significant advancements in sensor technology and data acquisition systems, EC time series frequently present substantial data gaps (20-60% annually) due to sensor limitations, adverse conditions, and strict quality control. These gaps hinder critical analyses such as the estimation of annual GHG budgets. Multiple imputation (MI) offers a principles approach for handling missing data allowing valid statistical inference. However, most MI algorithms are not designed to accomodate the characteristics of high-frequency, multivariate time series such as those produced by EC systems, and their performance in this context remains under-explored. To address this gap, this study evaluates the performance of three MI methods tailored for EC datasets. 1) EMB, an MI algorithm based on the Expectation-Maximization with Bootstrapping, (Honaker and King, 2010), which assumes multivariate normality of the data. It integrates time series features like polynomial trends, lagged variables, and hydro-ecological regime segmentation, as detailed in Vitale et al. (2018), to meet model assumption. 2) RF, a random forest-based MI (Doove et al., 2014) implemented within a fully conditional specification framework, which captures nonlinearities and variable interactions without requiring strong distributional assumptions. 3) XGB, an MI method based on Extreme Gradient Boosting (Deng & Lumley, 2023), a scalable and efficient implementation of gradient boosting, particularly suited for high-dimensional, structured datasets such as EC time series.. Performance was assessed via Monte Carlo simulations using the FLUXNET2015 dataset (Pastorello et al, 2020). In addition to comparing algorithmic performance, the study reviews current imputation evaluation metrics, emphasizing the need for robust, context-specific criteria for environmental time series. The findings provide guidance for selecting MI strategies in EC data processing pipelines, with implications for improving the accuracy of ecosystem carbon balance estimates.

Addressing Missing Data in Eddy-Covariance Time Series: A Comparative Study of Multiple Imputation Strategies / Vitale, Domenico; Tancredi, Andrea. - (2025), pp. 131-131. ( GRASPA2025 Rome; Italy ).

Addressing Missing Data in Eddy-Covariance Time Series: A Comparative Study of Multiple Imputation Strategies

Domenico Vitale;Andrea Tancredi

2025

Abstract

In the era of climate change, continuous monitoring of greenhouse gas (GHG) exchanges between terrestrial ecosystems and the atmosphere is crucial. A widely used techniques for large-scale, long-term monitoring is the Eddy-Covariance (EC) method, which produces half-hourly time series of surface fluxes for the main GHGs such as water vapor (H₂O), carbon dioxide (CO₂), methane (CH₄), and nitrous oxide (N₂O), along with various micro-meteorological variables. Despite significant advancements in sensor technology and data acquisition systems, EC time series frequently present substantial data gaps (20-60% annually) due to sensor limitations, adverse conditions, and strict quality control. These gaps hinder critical analyses such as the estimation of annual GHG budgets. Multiple imputation (MI) offers a principles approach for handling missing data allowing valid statistical inference. However, most MI algorithms are not designed to accomodate the characteristics of high-frequency, multivariate time series such as those produced by EC systems, and their performance in this context remains under-explored. To address this gap, this study evaluates the performance of three MI methods tailored for EC datasets. 1) EMB, an MI algorithm based on the Expectation-Maximization with Bootstrapping, (Honaker and King, 2010), which assumes multivariate normality of the data. It integrates time series features like polynomial trends, lagged variables, and hydro-ecological regime segmentation, as detailed in Vitale et al. (2018), to meet model assumption. 2) RF, a random forest-based MI (Doove et al., 2014) implemented within a fully conditional specification framework, which captures nonlinearities and variable interactions without requiring strong distributional assumptions. 3) XGB, an MI method based on Extreme Gradient Boosting (Deng & Lumley, 2023), a scalable and efficient implementation of gradient boosting, particularly suited for high-dimensional, structured datasets such as EC time series.. Performance was assessed via Monte Carlo simulations using the FLUXNET2015 dataset (Pastorello et al, 2020). In addition to comparing algorithmic performance, the study reviews current imputation evaluation metrics, emphasizing the need for robust, context-specific criteria for environmental time series. The findings provide guidance for selecting MI strategies in EC data processing pipelines, with implications for improving the accuracy of ecosystem carbon balance estimates.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				GRASPA2025
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04d Abstract in atti di convegno
			
	Citazione
	
				Addressing Missing Data in Eddy-Covariance Time Series: A Comparative Study of Multiple Imputation Strategies / Vitale, Domenico; Tancredi, Andrea. - (2025), pp. 131-131. ( GRASPA2025 Rome; Italy ).
			
	Appartiene alla tipologia:
	
				04d Abstract in atti di convegno

File allegati a questo prodotto

File	Dimensione	Formato
Vitale_Abstract-GRASPA_2025.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 617.28 kB Formato Adobe PDF Contatta l'autore	617.28 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1745481

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact