A Large-scale Pseudoword-based Evaluation Framework for State-of-the-Art Word Sense Disambiguation

Pilehvar, Mohammed Taher; Navigli, Roberto

doi:10.1162/COLI_a_00202

The evaluation of several tasks in lexical semantics is often limited by the lack of large numbers of manual annotations, not only for training purposes, but also for testing purposes. Word Sense Disambiguation (WSD) is a case in point, as hand-labeled data sets are particularly hard and time-consuming to create. Consequently, evaluations tend to be performed on a small scale, which does not allow for in-depth analysis of the factors that determine a system’s performance. In this article we address this issue by means of a realistic simulation of large-scale evaluation for the WSD task. We do this by providing two main contributions: First, we put forward two novel approaches to the wide-coverage generation of semantically aware pseudowords (i.e., artificial words capable of modeling real polysemous words); second, we leverage the most suitable type of pseudoword to create large pseudosense-annotated corpora, which enable a large-scale experimental framework for the comparison of st

A Large-scale Pseudoword-based Evaluation Framework for State-of-the-Art Word Sense Disambiguation / Pilehvar, MOHAMMED TAHER; Navigli, Roberto. - In: COMPUTATIONAL LINGUISTICS. - ISSN 1530-9312. - ELETTRONICO. - 4:40(2014), pp. 837-881. [10.1162/COLI_a_00202]

A Large-scale Pseudoword-based Evaluation Framework for State-of-the-Art Word Sense Disambiguation

PILEHVAR, MOHAMMED TAHER;NAVIGLI, ROBERTO

2014

Abstract

The evaluation of several tasks in lexical semantics is often limited by the lack of large numbers of manual annotations, not only for training purposes, but also for testing purposes. Word Sense Disambiguation (WSD) is a case in point, as hand-labeled data sets are particularly hard and time-consuming to create. Consequently, evaluations tend to be performed on a small scale, which does not allow for in-depth analysis of the factors that determine a system’s performance. In this article we address this issue by means of a realistic simulation of large-scale evaluation for the WSD task. We do this by providing two main contributions: First, we put forward two novel approaches to the wide-coverage generation of semantically aware pseudowords (i.e., artificial words capable of modeling real polysemous words); second, we leverage the most suitable type of pseudoword to create large pseudosense-annotated corpora, which enable a large-scale experimental framework for the comparison of st

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2014
			
	Parole chiave
	
				Word Sense Disambiguation; Pseudowords
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				A Large-scale Pseudoword-based Evaluation Framework for State-of-the-Art Word Sense Disambiguation / Pilehvar, MOHAMMED TAHER; Navigli, Roberto. - In: COMPUTATIONAL LINGUISTICS. - ISSN 1530-9312. - ELETTRONICO. - 4:40(2014), pp. 837-881. [10.1162/COLI_a_00202]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/645254

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

48

33

Catalogo dei prodotti della ricerca