Very often real-world databases also contain records which are anomalous, or atypical, in the sense they do not respect some contextual rules that are verified by normal records. They are sometimes called outliers, or peculiar data. This paper is concerned with the problem of automatic detection of such anomalous data. This is obtained by using a set of rules, expressed by means of some opportune formal system. Rules can be either given by human expertise or automatically generated. In particular, we will present our experience on data imputation and on fraud detection. In a general process of statistical data collecting, such as statistical investigations, marketing analysis, experimental measures, erroneous data should be detected and corrected. Erroneous data records are here detected by formulating a set of rules. Afterwards, errors should be corrected, by modifying as less as possible the erroneous data, while causing minimum perturbation to the original frequency distributions of the data. Such process is called imputation. By encoding the rules with linear inequalities, we convert imputation problems into integer programming problems. The proposed procedure is tested on a real-world case of census. Results are extremely encouraging both from the computational and from the data quality point of view. Service providers, such as telecommunications companies, financial institutions, insurance agencies, usually suffer costly losses because of frauds. This motivates recent rise of interest in fraud detection, that is to detect the largest number of fraud cases, while minimizing the number of false alarms. Rules used in this case are either generated by a human expert or by a machine learning approach. The two methodologies are analyzed. The problem of the presence of inconsistencies and redundancies in the set of rules are also discussed.

Detection of Outliers in Large Databases / Bruni, Renato; S., Canale; Sassano, Antonio. - STAMPA. - (2002). (Intervento presentato al convegno International Workshop on Mathematical Diagnostics tenutosi a Erice, Italy).

Detection of Outliers in Large Databases

BRUNI, Renato;SASSANO, Antonio
2002

Abstract

Very often real-world databases also contain records which are anomalous, or atypical, in the sense they do not respect some contextual rules that are verified by normal records. They are sometimes called outliers, or peculiar data. This paper is concerned with the problem of automatic detection of such anomalous data. This is obtained by using a set of rules, expressed by means of some opportune formal system. Rules can be either given by human expertise or automatically generated. In particular, we will present our experience on data imputation and on fraud detection. In a general process of statistical data collecting, such as statistical investigations, marketing analysis, experimental measures, erroneous data should be detected and corrected. Erroneous data records are here detected by formulating a set of rules. Afterwards, errors should be corrected, by modifying as less as possible the erroneous data, while causing minimum perturbation to the original frequency distributions of the data. Such process is called imputation. By encoding the rules with linear inequalities, we convert imputation problems into integer programming problems. The proposed procedure is tested on a real-world case of census. Results are extremely encouraging both from the computational and from the data quality point of view. Service providers, such as telecommunications companies, financial institutions, insurance agencies, usually suffer costly losses because of frauds. This motivates recent rise of interest in fraud detection, that is to detect the largest number of fraud cases, while minimizing the number of false alarms. Rules used in this case are either generated by a human expert or by a machine learning approach. The two methodologies are analyzed. The problem of the presence of inconsistencies and redundancies in the set of rules are also discussed.
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/498829
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact