An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective impu- tation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal compo- nent analysis (PCA), which alternates the use of PCA and the Nearest- Neighbour Imputation (NNI) method in a forward, sequential pro- cedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the itera- tive PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.
A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns / Solaro, N.; Barbiero, A.; Manzi, G.; Ferrari, P. A.. - In: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION. - ISSN 0094-9655. - 88:18(2018), pp. 3588-3619. [10.1080/00949655.2018.1530773]
A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns
G. MANZI;
2018
Abstract
An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective impu- tation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal compo- nent analysis (PCA), which alternates the use of PCA and the Nearest- Neighbour Imputation (NNI) method in a forward, sequential pro- cedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the itera- tive PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.