With big administrative data, often we have a large number of variables with different measurement levels and many missing data. The correct approach to handle these situations depends on the type of data and the purpose of analysis. However, we can not simply delete the incomplete records, because it amounts to a substantial loss of costly collected data. Single imputation or multiple imputation can be applied to obtain different aims, create an ‘imputed’ data matrix with the same characteristics of the observed data or take account, in the estimation of a model, of the additional variability due to the imputation process. For big administrative data, several approaches have been proposed in literature. In this paper we compare different approaches, considering both single and multiple imputation, and we propose a new method, named Multitree. By some simulations, we show that Multitree is competitive with the best methods considered in literature.
Missing data imputation by Multitree / DI CIACCIO, Agostino. - ELETTRONICO. - 1:(2015), pp. 103-104. (Intervento presentato al convegno 2015 IFCS Conference tenutosi a Bologna nel 6-8 Luglio).
Missing data imputation by Multitree
DI CIACCIO, AGOSTINO
2015
Abstract
With big administrative data, often we have a large number of variables with different measurement levels and many missing data. The correct approach to handle these situations depends on the type of data and the purpose of analysis. However, we can not simply delete the incomplete records, because it amounts to a substantial loss of costly collected data. Single imputation or multiple imputation can be applied to obtain different aims, create an ‘imputed’ data matrix with the same characteristics of the observed data or take account, in the estimation of a model, of the additional variability due to the imputation process. For big administrative data, several approaches have been proposed in literature. In this paper we compare different approaches, considering both single and multiple imputation, and we propose a new method, named Multitree. By some simulations, we show that Multitree is competitive with the best methods considered in literature.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.