The paper is concerned with the problem of automatic detection and correction of inconsistent or out of range data in a general process of statistical data collecting. As customary, erroneous data records are detected by formulating a set of rules. Erroneous records should then be corrected, by modifying as less as possible the erroneous data, while causing minimum perturbation to the original frequency distributions of the data. Such process is called imputation, and it constitutes an optimization problem. By encoding the rules with linear inequalities, we convert imputation problems into integer programming problems. The proposed procedure is tested on a real world case of census. Results are extremely encouraging both from the computational and from the data quality point of view.

Discrete Mathematics for Data Imputation / Bruni, Renato. - STAMPA. - (2002). (Intervento presentato al convegno Second SIAM International Conference on Data Mining, Worksop on Discrete Mathematics and Data Mining tenutosi a Arlington, Virginia, USA).

Discrete Mathematics for Data Imputation

BRUNI, Renato
2002

Abstract

The paper is concerned with the problem of automatic detection and correction of inconsistent or out of range data in a general process of statistical data collecting. As customary, erroneous data records are detected by formulating a set of rules. Erroneous records should then be corrected, by modifying as less as possible the erroneous data, while causing minimum perturbation to the original frequency distributions of the data. Such process is called imputation, and it constitutes an optimization problem. By encoding the rules with linear inequalities, we convert imputation problems into integer programming problems. The proposed procedure is tested on a real world case of census. Results are extremely encouraging both from the computational and from the data quality point of view.
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/498231
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact