Half-hourly time series of net ecosystem exchange (NEE) of CO2, latent heat flux (LE) and sensible heat flux (H) measured through the micro-meteorological eddy covariance (EC) technique are noisy and show a high percentage of missing data. By using EC measurements that are part of the FLUXNET2015 dataset, we evaluate the performance of a multiple imputation (MI) strategy based on an efficient computational strategy introduced in Honaker and King (2010), combining the classic Expectation-Maximization (EM) algorithm with a bootstrap approach, in order to take draws from a suitable approximation of posterior distribution of model parameters. Armed with these instruments, we are able to introduce three new multiple imputation models, characterized by an increasing level of complexity, and built on top of multivariate normality assumption: 1) MLR, which imputes EC missing values using a static multiple linear regression of observed values of suitable input variables; 2) ADL, which enriches with dynamic properties the static specification of MLR, by considering an autoregressive distributed lag specification; 3) PADL, which adds further complexity by embedding the ADL model in a panel-data perspective. Under several artificial gap scenarios, we show that PADL has a better ability in modeling the complex dynamics of ecosystem fluxes and reconstructing missing data points, thus providing unbiased imputations and preserving the original sampling distribution. The added flexibility arising from the time series cross section structure of PADL warrants improved performances, outperforming those of other imputation methods, as well as of the marginal distribution sampling algorithm (MDS), a widely used gap- filling approach introduced by Reichstein et al. (2005), especially in the case of nighttime flux data. It is expected that the strategy proposed in this paper will become useful in creating multiple imputations for a variety of EC datasets, providing valid inferences for a broad range of scientific estimands (such as annual budgets).

A Multiple Imputation Strategy for Eddy Covariance Data / Vitale, Domenico; Bilancia, Massimo; Papale, Dario. - In: JOURNAL OF ENVIRONMENTAL INFORMATICS. - ISSN 1726-2135. - 34:2(2019), pp. 68-87. [10.3808/jei.201800391]

A Multiple Imputation Strategy for Eddy Covariance Data

Domenico Vitale
;
2019

Abstract

Half-hourly time series of net ecosystem exchange (NEE) of CO2, latent heat flux (LE) and sensible heat flux (H) measured through the micro-meteorological eddy covariance (EC) technique are noisy and show a high percentage of missing data. By using EC measurements that are part of the FLUXNET2015 dataset, we evaluate the performance of a multiple imputation (MI) strategy based on an efficient computational strategy introduced in Honaker and King (2010), combining the classic Expectation-Maximization (EM) algorithm with a bootstrap approach, in order to take draws from a suitable approximation of posterior distribution of model parameters. Armed with these instruments, we are able to introduce three new multiple imputation models, characterized by an increasing level of complexity, and built on top of multivariate normality assumption: 1) MLR, which imputes EC missing values using a static multiple linear regression of observed values of suitable input variables; 2) ADL, which enriches with dynamic properties the static specification of MLR, by considering an autoregressive distributed lag specification; 3) PADL, which adds further complexity by embedding the ADL model in a panel-data perspective. Under several artificial gap scenarios, we show that PADL has a better ability in modeling the complex dynamics of ecosystem fluxes and reconstructing missing data points, thus providing unbiased imputations and preserving the original sampling distribution. The added flexibility arising from the time series cross section structure of PADL warrants improved performances, outperforming those of other imputation methods, as well as of the marginal distribution sampling algorithm (MDS), a widely used gap- filling approach introduced by Reichstein et al. (2005), especially in the case of nighttime flux data. It is expected that the strategy proposed in this paper will become useful in creating multiple imputations for a variety of EC datasets, providing valid inferences for a broad range of scientific estimands (such as annual budgets).
2019
eddy covariance; net ecosystem exchange; carbon budget; missing data; multiple imputations; Expectation-Maximization (EM) algorithm; panel autoregressive distributed lag model (PADL)
01 Pubblicazione su rivista::01a Articolo in rivista
A Multiple Imputation Strategy for Eddy Covariance Data / Vitale, Domenico; Bilancia, Massimo; Papale, Dario. - In: JOURNAL OF ENVIRONMENTAL INFORMATICS. - ISSN 1726-2135. - 34:2(2019), pp. 68-87. [10.3808/jei.201800391]
File allegati a questo prodotto
File Dimensione Formato  
Vitale-multiple-imputation_2019.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.95 MB
Formato Adobe PDF
1.95 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1665162
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 14
social impact