The imputation of missing values in the detail data of Educational Institutions is a difficult task. These data contain multivariate time series, which cannot be satisfactory imputed by many existing imputation techniques. Moreover, almost all the data of an Institution are interconnected: the number of graduates is not independent from the number of students, the expenditure is not independent from the staff, etc. In other words, each imputed value has an impact on the whole set of data of the institution. Therefore, imputation techniques for this specific case should be designed very carefully. We describe here the methods and the codes of the imputation methodology developed to impute the various patterns of missing values which appear in similar interconnected data. In particular, a first part of the proposed methodology, called ``trend smoothing imputation'', is designed to impute missing values in time series by respecting the trend and the other features of an Institution. The second part of the proposed methodology, called ``donor imputation'', is designed to impute larger chunks of missing data by using values taken form similar Institutions in order to respect again their size and trend. • Trend smoothing imputation can handle missing subsequences in time series, and is given by a weighted combination of: (a) weighed average of the other available values of the sequence, and (b) linear regression. • Donor imputation can handle full sequence missing in time series. It imputes the Recipient Institution using the values taken from a similar institution, called Donor, selected using optimization criteria. • The values imputed by our techniques should respect the trend, the size and the ratios of each Institution.

Optimization methods for the imputation of missing values in Educational Institutions Data / Aureli, D.; Bruni, R.; Daraio, C.. - In: METHODSX (AMSTERDAM). - ISSN 2215-0161. - 8:(2021). [10.1016/j.mex.2020.101208]

Optimization methods for the imputation of missing values in Educational Institutions Data

Aureli D.
;
Bruni R.
;
Daraio C.
2021

Abstract

The imputation of missing values in the detail data of Educational Institutions is a difficult task. These data contain multivariate time series, which cannot be satisfactory imputed by many existing imputation techniques. Moreover, almost all the data of an Institution are interconnected: the number of graduates is not independent from the number of students, the expenditure is not independent from the staff, etc. In other words, each imputed value has an impact on the whole set of data of the institution. Therefore, imputation techniques for this specific case should be designed very carefully. We describe here the methods and the codes of the imputation methodology developed to impute the various patterns of missing values which appear in similar interconnected data. In particular, a first part of the proposed methodology, called ``trend smoothing imputation'', is designed to impute missing values in time series by respecting the trend and the other features of an Institution. The second part of the proposed methodology, called ``donor imputation'', is designed to impute larger chunks of missing data by using values taken form similar Institutions in order to respect again their size and trend. • Trend smoothing imputation can handle missing subsequences in time series, and is given by a weighted combination of: (a) weighed average of the other available values of the sequence, and (b) linear regression. • Donor imputation can handle full sequence missing in time series. It imputes the Recipient Institution using the values taken from a similar institution, called Donor, selected using optimization criteria. • The values imputed by our techniques should respect the trend, the size and the ratios of each Institution.
2021
Data imputation; Educational Institutions; Information Reconstruction; Interconnected data; Machine learning; Trend Smoothing Imputation and Donor Imputation
01 Pubblicazione su rivista::01a Articolo in rivista
Optimization methods for the imputation of missing values in Educational Institutions Data / Aureli, D.; Bruni, R.; Daraio, C.. - In: METHODSX (AMSTERDAM). - ISSN 2215-0161. - 8:(2021). [10.1016/j.mex.2020.101208]
File allegati a questo prodotto
File Dimensione Formato  
Aureli_Optimization_2021.pdf

accesso aperto

Note: https://doi.org/10.1016/j.mex.2020.101208
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 247.76 kB
Formato Adobe PDF
247.76 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1484860
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact