Catalogo dei prodotti della ricerca

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Weighted-distance sliding windows and cooccurrence graphs for supporting entity-relationship discovery in unstructured text / Fantozzi, Paolo; Laura, Luigi; Nanni, Umberto. - In: INTERNATIONAL JOURNAL OF INFORMATION, CONTROL & COMPUTER SCIENCES. - ISSN 2517-9942. - 8:12(2018), pp. 663-669. [10.5281/zenodo.1474421]

Weighted-distance sliding windows and cooccurrence graphs for supporting entity-relationship discovery in unstructured text

Paolo Fantozzi;Luigi Laura;Umberto Nanni

2018

Abstract

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2018
			
	Parole chiave
	
				cooccurrence graph; entity relation graph; unstructured text; weighted distance
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Weighted-distance sliding windows and cooccurrence graphs for supporting entity-relationship discovery in unstructured text / Fantozzi, Paolo; Laura, Luigi; Nanni, Umberto. - In: INTERNATIONAL JOURNAL OF INFORMATION, CONTROL & COMPUTER SCIENCES. - ISSN 2517-9942. - 8:12(2018), pp. 663-669. [10.5281/zenodo.1474421]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Fantozzi_Weighted-Distance_2018.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 274.33 kB Formato Adobe PDF	274.33 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1427169

Citazioni

ND

ND

ND

social impact