Open Access Open Access  Restricted Access Subscription Access

Copyright © 2022 ISEIS. All rights reserved

A Multiple Imputation Strategy for Eddy Covariance Data

D. Vitale1,*, M. Bilancia2, and D. Papale1

  1. Department for Innovation in Biological, Agro-food and Forest Systems (DIBAF), University of Tuscia, via San Camillo de Lellis, 01100 Viterbo, Italy
  2. Ionian Department of Law, Economics and Environment, University of Bari Aldo Moro, Via Lago Maggiore angolo Via Ancona, 74121 Taranto, Italy

*Corresponding author. Tel.: +39-(0)761-357044; fax: +39-(0)761-357389. E-mail address: (D. Vitale).


Half-hourly time series of net ecosystem exchange (NEE) of CO2, latent heat flux (LE) and sensible heat flux (H) measured through the micro-meteorological eddy covariance (EC) technique are noisy and show a high percentage of missing data. By using EC measurements that are part of the FLUXNET2015 dataset, we evaluate the performance of a multiple imputation (MI) strategy based on an efficient computational strategy introduced in Honaker and King (2010), combining the classic Expectation-Maximization (EM) algorithm with a bootstrap approach, in order to take draws from a suitable approximation of posterior distribution of model parameters. Armed with these instruments, we are able to introduce three new multiple imputation models, characterized by an increasing level of complexity, and built on top of multivariate normality assumption: 1) MLR, which imputes EC missing values using a static multiple linear regression of observed values of suitable input variables; 2) ADL, which enriches with dynamic properties the static specification of MLR, by considering an autoregressive distributed lag specification; 3) PADL, which adds further complexity by embedding the ADL model in a panel-data perspective. Under several artificial gap scenarios, we show that PADL has a better ability in modeling the complex dynamics of ecosystem fluxes and reconstructing missing data points, thus providing unbiased imputations and preserving the original sampling distribution. The added flexibility arising from the time series cross section structure of PADL warrants improved performances, outperforming those of other imputation methods, as well as of the marginal distribution sampling algorithm (MDS), a widely used gap-filling approach introduced by Reichstein et al. (2005), especially in the case of nighttime flux data. It is expected that the strategy proposed in this paper will become useful in creating multiple imputations for a variety of EC datasets, providing valid inferences for a broad range of scientific estimands (such as annual budgets).

Keywords: eddy covariance, net ecosystem exchange, carbon budget, missing data, multiple imputations, Expectation-Maximization (EM) algorithm, panel autoregressive distributed lag model (PADL).

Supplementary Files:


  • There are currently no refbacks.