An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective impu- tation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal compo- nent analysis (PCA), which alternates the use of PCA and the Nearest- Neighbour Imputation (NNI) method in a forward, sequential pro- cedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the itera- tive PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.

A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns / Solaro, N.; Barbiero, A.; Manzi, G.; Ferrari, P. A.. - In: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION. - ISSN 0094-9655. - 88:18(2018), pp. 3588-3619. [10.1080/00949655.2018.1530773]

A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns

G. MANZI;
2018

Abstract

An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective impu- tation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal compo- nent analysis (PCA), which alternates the use of PCA and the Nearest- Neighbour Imputation (NNI) method in a forward, sequential pro- cedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the itera- tive PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.
2018
forward imputation; iterative principal component analysis; Mahalanobis distance; missForest; missing data; Monte Carlo simulation; multivariate exponential power distribution; multivariate skew-normal distribution; nearest-neighbour imputation
01 Pubblicazione su rivista::01a Articolo in rivista
A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns / Solaro, N.; Barbiero, A.; Manzi, G.; Ferrari, P. A.. - In: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION. - ISSN 0094-9655. - 88:18(2018), pp. 3588-3619. [10.1080/00949655.2018.1530773]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1727310
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 10
social impact