The hyperlink structure of Wikipedia constitutes a key resource for many Natural Language Processing tasks and applications, as it provides several million semantic annotations of entities in context. Yet only a small fraction of mentions across the entire Wikipedia corpus is linked. In this paper we present the automatic construction and evaluation of a Semantically Enriched Wikipedia (SEW) in which the overall number of linked mentions has been more than tripled solely by exploiting the structure of Wikipedia itself and the wide-coverage sense inventory of BabelNet. As a result we obtain a sense-annotated corpus with more than 200 million annotations of over 4 million different concepts and named entities. We then show that our corpus leads to competitive results on multiple tasks, such as Entity Linking and Word Similarity.
Automatic Construction and Evaluation of a Large Semantically Enriched Wikipedia / Raganato, Alessandro; DELLI BOVI, Claudio; Navigli, Roberto. - ELETTRONICO. - (2016), pp. 2894-2900. (Intervento presentato al convegno 25th International Joint Conference on Artificial Intelligence (IJCAI-16) tenutosi a New York City, NY, USA nel July 9 - 15, 2016).
Automatic Construction and Evaluation of a Large Semantically Enriched Wikipedia
raganato, alessandro;DELLI BOVI, CLAUDIO;NAVIGLI, ROBERTO
2016
Abstract
The hyperlink structure of Wikipedia constitutes a key resource for many Natural Language Processing tasks and applications, as it provides several million semantic annotations of entities in context. Yet only a small fraction of mentions across the entire Wikipedia corpus is linked. In this paper we present the automatic construction and evaluation of a Semantically Enriched Wikipedia (SEW) in which the overall number of linked mentions has been more than tripled solely by exploiting the structure of Wikipedia itself and the wide-coverage sense inventory of BabelNet. As a result we obtain a sense-annotated corpus with more than 200 million annotations of over 4 million different concepts and named entities. We then show that our corpus leads to competitive results on multiple tasks, such as Entity Linking and Word Similarity.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.