Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Conia, Simone; Min, Li; Lee, Daniel; Umar Farooq Minhas,; Ilyas, Ihab; Yunyao, Li

doi:10.18653/v1/2023.emnlp-main.100

Recent work in Natural Language Processing and Computer Vision has been using textual information – e.g., entity names and descriptions – available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Completion (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English languages. More specifically, we: i) bring to light the problem of increasing multilingual coverage and precision of entity names and descriptions in Wikidata; ii) demonstrate that state-of-the-art methods, namely, Machine Translation (MT), Web Search (WS), and Large Language Models (LLMs), struggle with this task; iii) present M-NTA, a novel unsupervised approach that combines MT, WS, and LLMs to generate high-quality textual information; and, iv) study the impact of increasing multilingual coverage and precision of non-English textual information in Entity Linking, Knowledge Graph Completion, and Question Answering. As part of our effort towards better multilingual knowledge graphs, we also introduce WikiKGE-10, the first human-curated benchmark to evaluate KGE approaches in 10 languages across 7 language families.

Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs / Conia, Simone; Li, Min; Lee, Daniel; Farooq Minhas, Umar; Ilyas, Ihab; Li, Yunyao. - (2023), pp. 1612-1634. ( Empirical Methods in Natural Language Processing Singapore ) [10.18653/v1/2023.emnlp-main.100].

Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Simone Conia^Primo;Min Li;Daniel Lee;Umar Farooq Minhas;Yunyao Li^Ultimo

2023

Abstract

Recent work in Natural Language Processing and Computer Vision has been using textual information – e.g., entity names and descriptions – available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Completion (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English languages. More specifically, we: i) bring to light the problem of increasing multilingual coverage and precision of entity names and descriptions in Wikidata; ii) demonstrate that state-of-the-art methods, namely, Machine Translation (MT), Web Search (WS), and Large Language Models (LLMs), struggle with this task; iii) present M-NTA, a novel unsupervised approach that combines MT, WS, and LLMs to generate high-quality textual information; and, iv) study the impact of increasing multilingual coverage and precision of non-English textual information in Entity Linking, Knowledge Graph Completion, and Question Answering. As part of our effort towards better multilingual knowledge graphs, we also introduce WikiKGE-10, the first human-curated benchmark to evaluate KGE approaches in 10 languages across 7 language families.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Nome convegno
	
				Empirical Methods in Natural Language Processing
			
	Parole chiave
	
				knowledge graphs; multilingual; natural language processing; large language models; machine translation
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs / Conia, Simone; Li, Min; Lee, Daniel; Farooq Minhas, Umar; Ilyas, Ihab; Li, Yunyao. - (2023), pp. 1612-1634. ( Empirical Methods in Natural Language Processing Singapore ) [10.18653/v1/2023.emnlp-main.100].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Conia_Increasing_2023.pdf accesso aperto Note: https://aclanthology.org/2023.emnlp-main.100.pdf Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 4.59 MB Formato Adobe PDF	4.59 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1696397

Citazioni

ND

9

2

Catalogo dei prodotti della ricerca