In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate level in the knowledge graphs. In this work, we propose an iterative methodology to extract an application-specific gold standard dataset from a knowledge graph and an evaluation framework to comparatively assess the quality of noisy automatically extracted taxonomies. We employ an existing state-of-the-art algorithm in an iterative manner and we propose several sampling strategies to reduce the amount of manual work needed for evaluation. A first gold standard dataset is released to the research community for this task along with a companion evaluation framework. This dataset addresses a real-world application from the medical domain, namely the extraction of food-drug and herb-drug interactions.

Evaluation dataset and methodology for extracting application-specific taxonomies from the wikipedia knowledge graph / Bordea, G.; Faralli, S.; Mougin, F.; Buitelaar, P.; Diallo, G.. - (2020), pp. 2341-2347. (Intervento presentato al convegno 12th International Conference on Language Resources and Evaluation, LREC 2020 tenutosi a Palais du Pharo, fra).

Evaluation dataset and methodology for extracting application-specific taxonomies from the wikipedia knowledge graph

Faralli S.
Co-primo
;
2020

Abstract

In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate level in the knowledge graphs. In this work, we propose an iterative methodology to extract an application-specific gold standard dataset from a knowledge graph and an evaluation framework to comparatively assess the quality of noisy automatically extracted taxonomies. We employ an existing state-of-the-art algorithm in an iterative manner and we propose several sampling strategies to reduce the amount of manual work needed for evaluation. A first gold standard dataset is released to the research community for this task along with a companion evaluation framework. This dataset addresses a real-world application from the medical domain, namely the extraction of food-drug and herb-drug interactions.
2020
12th International Conference on Language Resources and Evaluation, LREC 2020
Knowledge Graph Pruning; Knowledge Graphs; Taxonomy Extraction
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Evaluation dataset and methodology for extracting application-specific taxonomies from the wikipedia knowledge graph / Bordea, G.; Faralli, S.; Mougin, F.; Buitelaar, P.; Diallo, G.. - (2020), pp. 2341-2347. (Intervento presentato al convegno 12th International Conference on Language Resources and Evaluation, LREC 2020 tenutosi a Palais du Pharo, fra).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1622759
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact