Catalogo dei prodotti della ricerca

In this paper we present a novel method for clustering words in micro-blogs, based on the similarity of the related temporal series. Our technique, named SAX*, uses the Symbolic Aggregate ApproXimation algorithm to discretize the temporal series of terms into a small set of levels, leading to a string for each. We then define a subset of “interesting” strings, i.e. those representing patterns of collective attention. Sliding temporal windows are used to detect co-occurring clusters of tokens with the same or similar string. To assess the performance of the method we first tune the model parameters on a 2-month 1 % Twitter stream, during which a number of world-wide events of differing type and duration (sports, politics, disasters, health, and celebrities) occurred. Then, we evaluate the quality of all discovered events in a 1-year stream, “googling” with the most frequent cluster n-grams and manually assessing how many clusters correspond to published news in the same temporal slot. Finally, we perform a complexity evaluation and we compare SAX* with three alternative methods for event discovery. Our evaluation shows that SAX* is at least one order of magnitude less complex than other temporal and non-temporal approaches to micro-blog clustering. © 2015, The Author(s).

Efficient temporal mining of micro-blog texts and its application to event discovery / Velardi, P., Stilo, G.. - In: DATA MINING AND KNOWLEDGE DISCOVERY. - ISSN 1384-5810. - STAMPA. - 30:2(2016), pp. 372-402. [10.1007/s10618-015-0412-3]

Efficient temporal mining of micro-blog texts and its application to event discovery

VELARDI, Paola;STILO, GIOVANNI

2016

Abstract

In this paper we present a novel method for clustering words in micro-blogs, based on the similarity of the related temporal series. Our technique, named SAX*, uses the Symbolic Aggregate ApproXimation algorithm to discretize the temporal series of terms into a small set of levels, leading to a string for each. We then define a subset of “interesting” strings, i.e. those representing patterns of collective attention. Sliding temporal windows are used to detect co-occurring clusters of tokens with the same or similar string. To assess the performance of the method we first tune the model parameters on a 2-month 1 % Twitter stream, during which a number of world-wide events of differing type and duration (sports, politics, disasters, health, and celebrities) occurred. Then, we evaluate the quality of all discovered events in a 1-year stream, “googling” with the most frequent cluster n-grams and manually assessing how many clusters correspond to published news in the same temporal slot. Finally, we perform a complexity evaluation and we compare SAX* with three alternative methods for event discovery. Our evaluation shows that SAX* is at least one order of magnitude less complex than other temporal and non-temporal approaches to micro-blog clustering. © 2015, The Author(s).

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2016
			
	Parole chiave
	
				event detection; microblog analysis; Symbolic Aggregate ApproXimation; temporal mining
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Efficient temporal mining of micro-blog texts and its application to event discovery / Velardi, P., Stilo, G.. - In: DATA MINING AND KNOWLEDGE DISCOVERY. - ISSN 1384-5810. - STAMPA. - 30:2(2016), pp. 372-402. [10.1007/s10618-015-0412-3]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Stilo_Temporal-Mining_2016.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.62 MB Formato Adobe PDF Contatta l'autore	1.62 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/780485

Citazioni

ND

69

49

social impact