Catalogo dei prodotti della ricerca

In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate level in the knowledge graphs. In this work, we propose an iterative methodology to extract an application-specific gold standard dataset from a knowledge graph and an evaluation framework to comparatively assess the quality of noisy automatically extracted taxonomies. We employ an existing state-of-the-art algorithm in an iterative manner and we propose several sampling strategies to reduce the amount of manual work needed for evaluation. A first gold standard dataset is released to the research community for this task along with a companion evaluation framework. This dataset addresses a real-world application from the medical domain, namely the extraction of food-drug and herb-drug interactions.

Evaluation dataset and methodology for extracting application-specific taxonomies from the wikipedia knowledge graph / Bordea, G., Faralli, S., Mougin, F., Buitelaar, P., Diallo, G.. - (2020), pp. 2341-2347. (12th International Conference on Language Resources and Evaluation, LREC 2020 Palais du Pharo, fra ).

Evaluation dataset and methodology for extracting application-specific taxonomies from the wikipedia knowledge graph

Faralli S.^Co-primo;Diallo G.^Co-primo

2020

Abstract

In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate level in the knowledge graphs. In this work, we propose an iterative methodology to extract an application-specific gold standard dataset from a knowledge graph and an evaluation framework to comparatively assess the quality of noisy automatically extracted taxonomies. We employ an existing state-of-the-art algorithm in an iterative manner and we propose several sampling strategies to reduce the amount of manual work needed for evaluation. A first gold standard dataset is released to the research community for this task along with a companion evaluation framework. This dataset addresses a real-world application from the medical domain, namely the extraction of food-drug and herb-drug interactions.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2020
			
	Nome convegno
	
				12th International Conference on Language Resources and Evaluation, LREC 2020
			
	Parole chiave
	
				Knowledge Graph Pruning; Knowledge Graphs; Taxonomy Extraction
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Evaluation dataset and methodology for extracting application-specific taxonomies from the wikipedia knowledge graph / Bordea, G., Faralli, S., Mougin, F., Buitelaar, P., Diallo, G.. - (2020), pp. 2341-2347. (12th International Conference on Language Resources and Evaluation, LREC 2020 Palais du Pharo, fra ).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1622759

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

5

3

social impact