In this paper we propose a new method to deal with missingness in categorical data. The new proposal is a forward imputation procedure and is presented in the context of the Nonlinear Principal Component Analysis, used to obtain indicators from a large dataset. However, this procedure can be easily adopted in other contexts, and when other multivariate techniques are used. We discuss the statistical features of our imputation technique in connection with other treatment methods which are popular among Nonlinear Principal Component Analysis users. The performance of our method is then compared to the other methods through a simulation study which involves the application to a real dataset extracted from the Euro-barometer survey. Missing data are created in the original data matrix and then the comparison is performed in terms of how close the Nonlinear Principal Component Analysis outcomes from missing data treatment methods are to the ones obtained from the original data. The new procedure is seen to provide better results than the other methods under the different conditions considered.
Handling Missing Data in Presence of Categorical Variables: a New Imputation Procedure / Ferrari, P. A.; Barbiero, A.; Manzi, G.. - (2011), pp. 473-480. [10.1007/978-3-642-11363-5_53].
Handling Missing Data in Presence of Categorical Variables: a New Imputation Procedure
G. Manzi
2011
Abstract
In this paper we propose a new method to deal with missingness in categorical data. The new proposal is a forward imputation procedure and is presented in the context of the Nonlinear Principal Component Analysis, used to obtain indicators from a large dataset. However, this procedure can be easily adopted in other contexts, and when other multivariate techniques are used. We discuss the statistical features of our imputation technique in connection with other treatment methods which are popular among Nonlinear Principal Component Analysis users. The performance of our method is then compared to the other methods through a simulation study which involves the application to a real dataset extracted from the Euro-barometer survey. Missing data are created in the original data matrix and then the comparison is performed in terms of how close the Nonlinear Principal Component Analysis outcomes from missing data treatment methods are to the ones obtained from the original data. The new procedure is seen to provide better results than the other methods under the different conditions considered.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.