Open-Source INTelligence (OSINT) is intelligence based on publicly available sources, such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled from it, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but due to the vast amount of documents available, automatic mechanisms for their population starting from the crawled text are needed. In this paper, we present an approach for the automatic population of pre-defined ontologies based on the General Architecture for Text Engineering (GATE) system. We present some experimental results, which are encouraging in terms of extracted correct instances of the ontology. Finally, we describe an alternative approach and additional experiments for one of the phases of our pipeline, which requires the use of pre-defined dictionaries for relevant entities. Thanks to such variant, we were able to reduce the manual effort required in this phase, still obtaining promising results.
Ontology Population from Raw Text Corpus for Open-Source Intelligence / Ganino, Giulio; Lembo, Domenico; Scafoglieri, Federico. - 10544:(2018), pp. 173-186. (Intervento presentato al convegno 17th International Conference on Web Engineering, ICWE 2017 tenutosi a Rome; Italy) [10.1007/978-3-319-74433-9_16].
Ontology Population from Raw Text Corpus for Open-Source Intelligence
GANINO, GIULIO;Domenico Lembo
;Scafoglieri, Federico
2018
Abstract
Open-Source INTelligence (OSINT) is intelligence based on publicly available sources, such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled from it, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but due to the vast amount of documents available, automatic mechanisms for their population starting from the crawled text are needed. In this paper, we present an approach for the automatic population of pre-defined ontologies based on the General Architecture for Text Engineering (GATE) system. We present some experimental results, which are encouraging in terms of extracted correct instances of the ontology. Finally, we describe an alternative approach and additional experiments for one of the phases of our pipeline, which requires the use of pre-defined dictionaries for relevant entities. Thanks to such variant, we were able to reduce the manual effort required in this phase, still obtaining promising results.File | Dimensione | Formato | |
---|---|---|---|
Ganino_Ontology-Population_2018.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
268.29 kB
Formato
Adobe PDF
|
268.29 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.