Catalogo dei prodotti della ricerca

In many research fields such as Psychology, Linguistics, Cognitive Science and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper a new semantic similarity metric, that exploits some notions of the feature-based theory of similarity and translates it into the information theoretic domain, which lever-ages the notion of Information Content (IC), is presented. In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, an on line experiment asking the community of researchers to rank a list of 65 word pairs has been conducted. The experiment's web setup allowed to collect 101 similarity ratings and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that the proposed metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC-based metrics. In order to investigate the generality of both the intrinsic IC formulation and proposed similarity metric a further evaluation using the MeSH biomedical ontology has been performed. Even in this case significant results were obtained. The proposed metric and several others have been implemented in the Java WordNet Similarity Library. (C) 2009 Elsevier B.V. All rights reserved.

A semantic similarity metric combining features and intrinsic information content / Pirró, Giuseppe. - In: DATA & KNOWLEDGE ENGINEERING. - ISSN 0169-023X. - 68:11(2009), pp. 1289-1308. [10.1016/j.datak.2009.06.008]

A semantic similarity metric combining features and intrinsic information content

Pirró, Giuseppe

2009

Abstract

In many research fields such as Psychology, Linguistics, Cognitive Science and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper a new semantic similarity metric, that exploits some notions of the feature-based theory of similarity and translates it into the information theoretic domain, which lever-ages the notion of Information Content (IC), is presented. In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, an on line experiment asking the community of researchers to rank a list of 65 word pairs has been conducted. The experiment's web setup allowed to collect 101 similarity ratings and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that the proposed metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC-based metrics. In order to investigate the generality of both the intrinsic IC formulation and proposed similarity metric a further evaluation using the MeSH biomedical ontology has been performed. Even in this case significant results were obtained. The proposed metric and several others have been implemented in the Java WordNet Similarity Library. (C) 2009 Elsevier B.V. All rights reserved.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2009
			
	Parole chiave
	
				Semantic similarity; Intrinsic information content; Similarity experiment; Similarity on MeSH; Java WordNet Similarity Library
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				A semantic similarity metric combining features and intrinsic information content / Pirró, Giuseppe. - In: DATA & KNOWLEDGE ENGINEERING. - ISSN 0169-023X. - 68:11(2009), pp. 1289-1308. [10.1016/j.datak.2009.06.008]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1274261

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

201

153

social impact