Catalogo dei prodotti della ricerca

Despite being one of the most popular tasks in lexical semantics, word similarity has often been limited to the English language. Other languages, even those that are widely spoken such as Spanish, do not have a reliable word similarity evaluation framework. We put forward robust methodologies for the extension of existing English datasets to other languages, both at monolingual and cross-lingual levels. We propose an automatic standardization for the construction of cross-lingual similarity datasets, and provide an evaluation, demonstrating its reliability and robustness. Based on our procedure and taking the RG-65 word similarity dataset as a reference, we release two high-quality Spanish and Farsi (Persian) monolingual datasets, and fifteen cross-lingual datasets for six languages: English, Spanish, French, German, Portuguese, and Farsi.

A framework for the construction of monolingual and cross-lingual Word Similarity Datasets / CAMACHO COLLADOS, J., Pilehvar, M.T., Navigli, R.. - ELETTRONICO. - 1:(2015), pp. 1-7. (ACL Beijing, China Luglio, 2015).

A framework for the construction of monolingual and cross-lingual Word Similarity Datasets

CAMACHO COLLADOS, JOSE';PILEHVAR, MOHAMMED TAHER;NAVIGLI, ROBERTO

2015

Abstract

Despite being one of the most popular tasks in lexical semantics, word similarity has often been limited to the English language. Other languages, even those that are widely spoken such as Spanish, do not have a reliable word similarity evaluation framework. We put forward robust methodologies for the extension of existing English datasets to other languages, both at monolingual and cross-lingual levels. We propose an automatic standardization for the construction of cross-lingual similarity datasets, and provide an evaluation, demonstrating its reliability and robustness. Based on our procedure and taking the RG-65 word similarity dataset as a reference, we release two high-quality Spanish and Farsi (Persian) monolingual datasets, and fifteen cross-lingual datasets for six languages: English, Spanish, French, German, Portuguese, and Farsi.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2015
			
	Nome convegno
	
				ACL
			
	Parole chiave
	
				Word Similarity; Natural Language Processing; Semantic Similarity
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				A framework for the construction of monolingual and cross-lingual Word Similarity Datasets / CAMACHO COLLADOS, J., Pilehvar, M.T., Navigli, R.. - ELETTRONICO. - 1:(2015), pp. 1-7. (ACL Beijing, China Luglio, 2015).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Camacho_Framework_2015.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 748.69 kB Formato Adobe PDF	748.69 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/845383

Citazioni

ND

ND

34

social impact