Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale. In this paper we present EuroSense, a multilingual sense-annotated resource based on the joint disambiguation of the Europarl parallel corpus, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities from a language-independent unified sense inventory. We evaluate the quality of our sense annotations intrinsically and extrinsically, showing their effectiveness as training data for Word Sense Disambiguation.
EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text / DELLI BOVI, Claudio; CAMACHO COLLADOS, Jose'; Raganato, Alessandro; Navigli, Roberto. - ELETTRONICO. - 1:(2017). (Intervento presentato al convegno Proceedings of 55th annual meeting of the Association for Computational Linguistics (ACL 2017) tenutosi a Vancouver, Canada nel 30 July-4 August 2017) [10.18653/v1/P17-2094].
EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text
DELLI BOVI, CLAUDIO;CAMACHO COLLADOS, JOSE';raganato, alessandro;NAVIGLI, Roberto
2017
Abstract
Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale. In this paper we present EuroSense, a multilingual sense-annotated resource based on the joint disambiguation of the Europarl parallel corpus, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities from a language-independent unified sense inventory. We evaluate the quality of our sense annotations intrinsically and extrinsically, showing their effectiveness as training data for Word Sense Disambiguation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.