Catalogo dei prodotti della ricerca

Object Matching (OM) is the problem of identifying pairs of data-objects coming from different sources and representing the same real world object. Several methods have been proposed to solve OM problems, but none of them seems to be at the same time fully automated and very effective. In this paper we present a fundamentally new suite of methods that instead possesses both these abilities. We adopt a statistical approach based on mixture models, which structures an OM process into two consecutive tasks. First, mixture parameters are estimated by fitting the model to observed distance measures between pairs. Then, a probabilistic clustering of the pairs into Matches and Unmatches is obtained by exploiting the fitted model. In particular, we use a mixture model with component densities belonging to the Beta parametric family and we fit it by means of an original perturbation-like technique. Moreover, we solve the clustering problem according to both Maximum Likelihood and Minimum Cost objectives. To accomplish this task, optimal decision rules fulfilling one-to-one matching constraints are searched by a purposefully designed evolutionary algorithm. Notably, our suite of methods is distance-independent in the sense that it does not rely on any restrictive assumption on the function to be used when comparing data-objects. Even more interestingly, our approach is not confined to record linkage applications but can be applied to match also other kinds of data-objects. We present several experiments on real data that validate the proposed methods and show their excellent effectiveness. © 2010 IEEE.

Effective automated object matching / Diego, Zardetto; Monica, Scannapieco; Catarci, Tiziana. - (2010), pp. 757-768. (Intervento presentato al convegno 26th IEEE International Conference on Data Engineering, ICDE 2010 tenutosi a Long Beach; United States nel 1 March 2010 through 6 March 2010) [10.1109/icde.2010.5447904].

Effective automated object matching

Diego Zardetto;Monica Scannapieco;CATARCI, Tiziana

2010

Abstract

Object Matching (OM) is the problem of identifying pairs of data-objects coming from different sources and representing the same real world object. Several methods have been proposed to solve OM problems, but none of them seems to be at the same time fully automated and very effective. In this paper we present a fundamentally new suite of methods that instead possesses both these abilities. We adopt a statistical approach based on mixture models, which structures an OM process into two consecutive tasks. First, mixture parameters are estimated by fitting the model to observed distance measures between pairs. Then, a probabilistic clustering of the pairs into Matches and Unmatches is obtained by exploiting the fitted model. In particular, we use a mixture model with component densities belonging to the Beta parametric family and we fit it by means of an original perturbation-like technique. Moreover, we solve the clustering problem according to both Maximum Likelihood and Minimum Cost objectives. To accomplish this task, optimal decision rules fulfilling one-to-one matching constraints are searched by a purposefully designed evolutionary algorithm. Notably, our suite of methods is distance-independent in the sense that it does not rely on any restrictive assumption on the function to be used when comparing data-objects. Even more interestingly, our approach is not confined to record linkage applications but can be applied to match also other kinds of data-objects. We present several experiments on real data that validate the proposed methods and show their excellent effectiveness. © 2010 IEEE.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2010
			
	Nome convegno
	
				26th IEEE International Conference on Data Engineering, ICDE 2010
			
	Parole chiave
	
				Clustering problems; Component density; Distance measure
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Effective automated object matching / Diego, Zardetto; Monica, Scannapieco; Catarci, Tiziana. - (2010), pp. 757-768. (Intervento presentato al  convegno 26th IEEE International Conference on Data Engineering, ICDE 2010 tenutosi a Long Beach; United States nel 1 March 2010 through 6 March 2010) [10.1109/icde.2010.5447904].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
VE_2010_11573-209493.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 261.42 kB Formato Adobe PDF Contatta l'autore	261.42 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/209493

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

15

9

social impact