Catalogo dei prodotti della ricerca

Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automatic approaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases and annotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly in Word Sense Disambiguation have widely demonstrated their effectiveness not only for the creation of lexicographic resources, but also for enabling a deeper analysis of lexical-semantic data both within and across languages. Nevertheless, we argue that the potential derived from the connections between the two fields is far from exhausted. In this work, we address a serious limitation affecting both lexicography and Word Sense Disambiguation, i.e. the lack of high-quality sense-annotated data and describe our efforts aimed at constructing a novel entirely manually annotated parallel dataset in 10 European languages. For the purposes of the present paper, we concentrate on the annotation of morpho-syntactic features. Finally, unlike many of the currently available sense-annotated datasets, we will annotate semantically by using senses derived from high-quality lexicographic repositories.

Designing the ELEXIS Parallel Sense-Annotated Dataset in 10 European Languages / Martelli, F., Navigli, R., Krek, S., Kallas, J., Gantar, P., Koeva, S., Nimb, S., Sandford Pedersen, B., Olsen, S., Langemets, M., Koppel, K., Üksik, T., Dobrovoljc, K., Ureña-Ruiz, Rafael-J., Sancho-Sánchez, J., Lipp, V., Váradi, T., Győrffy, A., László, S., Quochi, V., et al.. - (2021), pp. 377-395. (7th Biennial Conference on Electronic Lexicography, eLex 2021 Online ).

Designing the ELEXIS Parallel Sense-Annotated Dataset in 10 European Languages

Federico Martelli^Primo;Roberto Navigli^Secondo;Simon Krek;Jelena Kallas;Polona Gantar;Svetla Koeva;Sanni Nimb;Bolette Sandford Pedersen;Sussi Olsen;Margit Langemets;Kristina Koppel;Tiiu Üksik;Kaja Dobrovoljc;Rafael-J. Ureña-Ruiz;José-Luis Sancho-Sánchez;Veronika Lipp;Tamás Váradi;András Győrffy;Simon László;Valeria Quochi;Monica Monachini;Francesca Frontini;Carole Tiberius;Rob Tempelaars;Rute Costa;Ana Salgado;Jaka Čibej;Tina Munda

2021

Abstract

Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automatic approaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases and annotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly in Word Sense Disambiguation have widely demonstrated their effectiveness not only for the creation of lexicographic resources, but also for enabling a deeper analysis of lexical-semantic data both within and across languages. Nevertheless, we argue that the potential derived from the connections between the two fields is far from exhausted. In this work, we address a serious limitation affecting both lexicography and Word Sense Disambiguation, i.e. the lack of high-quality sense-annotated data and describe our efforts aimed at constructing a novel entirely manually annotated parallel dataset in 10 European languages. For the purposes of the present paper, we concentrate on the annotation of morpho-syntactic features. Finally, unlike many of the currently available sense-annotated datasets, we will annotate semantically by using senses derived from high-quality lexicographic repositories.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2021
			
	Nome convegno
	
				7th Biennial Conference on Electronic Lexicography, eLex 2021
			
	Parole chiave
	
				Digital lexicography; Natural Language Processing; Computational Linguistics; Corpus Linguistics; Word Sense Disambiguation
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Designing the ELEXIS Parallel Sense-Annotated Dataset in 10 European Languages / Martelli, F., Navigli, R., Krek, S., Kallas, J., Gantar, P., Koeva, S., Nimb, S., Sandford Pedersen, B., Olsen, S., Langemets, M., Koppel, K., Üksik, T., Dobrovoljc, K., Ureña-Ruiz, Rafael-J., Sancho-Sánchez, J., Lipp, V., Váradi, T., Győrffy, A., László, S., Quochi, V., et al.. - (2021), pp. 377-395. (7th Biennial Conference on Electronic Lexicography, eLex 2021 Online ).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Martelli_Designing2021.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 558.03 kB Formato Adobe PDF	558.03 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1604144

Citazioni

ND

9

ND

social impact