Educational Institutions data constitute the basis for several important analyses on the educational systems; however they often contain not negligible shares of missing values, for several reasons. We consider in this work the relevant case of the European Tertiary Education Register (ETER), describing the Educational Institutions of Europe. The presence of missing values prevents the full exploitation of this database, since several types of analyses that could be performed are currently impracticable. The imputation of artificial data, reconstructed with the aim of being statistically equivalent to the (unknown) missing data, would allow to overcome these problems. A main complication in the imputation of this type of data is given by the correlations that exist among all the variables. We propose several imputation techniques designed to deal with the different types of missing values appearing in these interconnected data. We use these techniques to impute the database. Moreover, we evaluate the accuracy of the proposed approach by artificially introducing missing data, by imputing them, and by comparing imputed and original values. Results show that the information reconstruction does not introduce statistically significant changes in the data and that the imputed values are close enough to the original values.

Imputation techniques for the reconstruction of missing interconnected data from higher Educational Institutions / Bruni, R.; Daraio, C.; Aureli, D.. - In: KNOWLEDGE-BASED SYSTEMS. - ISSN 0950-7051. - (2020), p. 106512. [10.1016/j.knosys.2020.106512]

Imputation techniques for the reconstruction of missing interconnected data from higher Educational Institutions

Bruni R.
;
Daraio C.;Aureli D.
2020

Abstract

Educational Institutions data constitute the basis for several important analyses on the educational systems; however they often contain not negligible shares of missing values, for several reasons. We consider in this work the relevant case of the European Tertiary Education Register (ETER), describing the Educational Institutions of Europe. The presence of missing values prevents the full exploitation of this database, since several types of analyses that could be performed are currently impracticable. The imputation of artificial data, reconstructed with the aim of being statistically equivalent to the (unknown) missing data, would allow to overcome these problems. A main complication in the imputation of this type of data is given by the correlations that exist among all the variables. We propose several imputation techniques designed to deal with the different types of missing values appearing in these interconnected data. We use these techniques to impute the database. Moreover, we evaluate the accuracy of the proposed approach by artificially introducing missing data, by imputing them, and by comparing imputed and original values. Results show that the information reconstruction does not introduce statistically significant changes in the data and that the imputed values are close enough to the original values.
2020
data imputation; Educational Institutions; information reconstruction; machine learning
01 Pubblicazione su rivista::01a Articolo in rivista
Imputation techniques for the reconstruction of missing interconnected data from higher Educational Institutions / Bruni, R.; Daraio, C.; Aureli, D.. - In: KNOWLEDGE-BASED SYSTEMS. - ISSN 0950-7051. - (2020), p. 106512. [10.1016/j.knosys.2020.106512]
File allegati a questo prodotto
File Dimensione Formato  
Bruni_Imputation_2020.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 964.12 kB
Formato Adobe PDF
964.12 kB Adobe PDF   Contatta l'autore
Bruni_Pre-print_Imputation_2020.pdf

accesso aperto

Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.2 MB
Formato Adobe PDF
1.2 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1466386
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 11
social impact