Open-Source INTelligence (OSINT) is intelligence based on publicly available sources, such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled from it, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but due to the vast amount of documents available, automatic mechanisms for their population starting from the crawled text are needed. In this paper, we present an approach for the automatic population of pre-defined ontologies based on the General Architecture for Text Engineering (GATE) system. We present some experimental results, which are encouraging in terms of extracted correct instances of the ontology. Finally, we describe an alternative approach and additional experiments for one of the phases of our pipeline, which requires the use of pre-defined dictionaries for relevant entities. Thanks to such variant, we were able to reduce the manual effort required in this phase, still obtaining promising results.

Ontology Population from Raw Text Corpus for Open-Source Intelligence / Ganino, Giulio; Lembo, Domenico; Scafoglieri, Federico. - 10544:(2018), pp. 173-186. (Intervento presentato al convegno 17th International Conference on Web Engineering, ICWE 2017 tenutosi a Rome; Italy) [10.1007/978-3-319-74433-9_16].

Ontology Population from Raw Text Corpus for Open-Source Intelligence

GANINO, GIULIO;Domenico Lembo
;
Scafoglieri, Federico
2018

Abstract

Open-Source INTelligence (OSINT) is intelligence based on publicly available sources, such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled from it, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but due to the vast amount of documents available, automatic mechanisms for their population starting from the crawled text are needed. In this paper, we present an approach for the automatic population of pre-defined ontologies based on the General Architecture for Text Engineering (GATE) system. We present some experimental results, which are encouraging in terms of extracted correct instances of the ontology. Finally, we describe an alternative approach and additional experiments for one of the phases of our pipeline, which requires the use of pre-defined dictionaries for relevant entities. Thanks to such variant, we were able to reduce the manual effort required in this phase, still obtaining promising results.
2018
17th International Conference on Web Engineering, ICWE 2017
Ontology; Open source software
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Ontology Population from Raw Text Corpus for Open-Source Intelligence / Ganino, Giulio; Lembo, Domenico; Scafoglieri, Federico. - 10544:(2018), pp. 173-186. (Intervento presentato al convegno 17th International Conference on Web Engineering, ICWE 2017 tenutosi a Rome; Italy) [10.1007/978-3-319-74433-9_16].
File allegati a questo prodotto
File Dimensione Formato  
Ganino_Ontology-Population_2018.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 268.29 kB
Formato Adobe PDF
268.29 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1079315
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
social impact