Semantic representation lies at the core of computational lexical semantics, which is a key research field in Natural Language Processing. Because of the need for a deeper understanding of linguistic units, semantic representation is considered to be one of the fundamental components of several applications in Natural Language Processing and Artificial Intelligence. However, due mainly to the lack of large sense-annotated corpora, most existing representation techniques are limited to the lexical level and thus cannot be effectively applied to individual word senses. In this paper we put forward a novel multilingual vector representation, called Nasari, which not only enables accurate representation of word senses in different languages, but it also provides two main advantages over existing approaches: (1) high coverage, including both concepts and named entities, (2) comparability across languages and linguistic levels (i.e. words, senses and concepts), thanks to the representation of linguistic items in a single unified semantic space and in a joint embedded space, respectively. Moreover, our representations are flexible, can be applied to multiple applications and are freely available at http://lcl.uniroma1.it/nasari/. As evaluation benchmark, we opted for four different tasks, namely, word similarity, sense clustering, domain labeling, and Word Sense Disambiguation, for each of which we report state-of-the-art performance on several standard datasets across different languages.
NASARI: Multilingual Semantically-grounded Distributional Vectors / CAMACHO COLLADOS, Jose'; Pilehvar, MOHAMMED TAHER; Navigli, Roberto. - In: ARTIFICIAL INTELLIGENCE. - ISSN 0004-3702. - ELETTRONICO. - (In corso di stampa).
NASARI: Multilingual Semantically-grounded Distributional Vectors
CAMACHO COLLADOS, JOSE';PILEHVAR, MOHAMMED TAHER;NAVIGLI, ROBERTO
In corso di stampa
Abstract
Semantic representation lies at the core of computational lexical semantics, which is a key research field in Natural Language Processing. Because of the need for a deeper understanding of linguistic units, semantic representation is considered to be one of the fundamental components of several applications in Natural Language Processing and Artificial Intelligence. However, due mainly to the lack of large sense-annotated corpora, most existing representation techniques are limited to the lexical level and thus cannot be effectively applied to individual word senses. In this paper we put forward a novel multilingual vector representation, called Nasari, which not only enables accurate representation of word senses in different languages, but it also provides two main advantages over existing approaches: (1) high coverage, including both concepts and named entities, (2) comparability across languages and linguistic levels (i.e. words, senses and concepts), thanks to the representation of linguistic items in a single unified semantic space and in a joint embedded space, respectively. Moreover, our representations are flexible, can be applied to multiple applications and are freely available at http://lcl.uniroma1.it/nasari/. As evaluation benchmark, we opted for four different tasks, namely, word similarity, sense clustering, domain labeling, and Word Sense Disambiguation, for each of which we report state-of-the-art performance on several standard datasets across different languages.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.