Catalogo dei prodotti della ricerca

Open-Source INTelligence (OSINT) is intelligence based on publicly available sources, such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled from it, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but due to the vast amount of documents available, automatic mechanisms for their population starting from the crawled text are needed. In this paper, we present an approach for the automatic population of pre-defined ontologies based on the General Architecture for Text Engineering (GATE) system. We present some experimental results, which are encouraging in terms of extracted correct instances of the ontology. Finally, we describe an alternative approach and additional experiments for one of the phases of our pipeline, which requires the use of pre-defined dictionaries for relevant entities. Thanks to such variant, we were able to reduce the manual effort required in this phase, still obtaining promising results.

Ontology Population from Raw Text Corpus for Open-Source Intelligence / Ganino, Giulio; Lembo, Domenico; Scafoglieri, Federico. - 10544:(2018), pp. 173-186. (Intervento presentato al convegno 17th International Conference on Web Engineering, ICWE 2017 tenutosi a Rome; Italy) [10.1007/978-3-319-74433-9_16].

Ontology Population from Raw Text Corpus for Open-Source Intelligence

GANINO, GIULIO;Domenico Lembo;Scafoglieri, Federico

2018

Abstract

Open-Source INTelligence (OSINT) is intelligence based on publicly available sources, such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled from it, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but due to the vast amount of documents available, automatic mechanisms for their population starting from the crawled text are needed. In this paper, we present an approach for the automatic population of pre-defined ontologies based on the General Architecture for Text Engineering (GATE) system. We present some experimental results, which are encouraging in terms of extracted correct instances of the ontology. Finally, we describe an alternative approach and additional experiments for one of the phases of our pipeline, which requires the use of pre-defined dictionaries for relevant entities. Thanks to such variant, we were able to reduce the manual effort required in this phase, still obtaining promising results.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
			2018
		
	Nome convegno
	
			17th International Conference on Web Engineering, ICWE 2017
		
	Parole chiave
	
			Ontology; Open source software
		
	Tipologia
	
			04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
		
	Citazione
	
			Ontology Population from Raw Text Corpus for Open-Source Intelligence / Ganino, Giulio; Lembo, Domenico; Scafoglieri, Federico. - 10544:(2018), pp. 173-186. (Intervento presentato al  convegno 17th International Conference on Web Engineering, ICWE 2017 tenutosi a Rome; Italy) [10.1007/978-3-319-74433-9_16].
		
	Appartiene alla tipologia:
	
			04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Ganino_Ontology-Population_2018.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 268.29 kB Formato Adobe PDF Contatta l'autore	268.29 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1079315

Citazioni

ND

3

0

social impact