Catalogo dei prodotti della ricerca

Text clustering methods allow automatic classification of a large set of documents. Many algorithms can be applied using the proposed methods for structured data. However, the corpus, once transformed from unstructured information into structured data, presents a high dimensionality and an overlapping of the clusters that could jeopardize understandability of the cluster description. In this paper, we introduce a new method of detecting centroids of clusters. Centroids represent prototypes of mutually exclusive partitions, and they can therefore facilitate interpretation of the results to describe groups. In this approach, after the preprocessing step, we establish links between documents by using co-occurrence information, within some lexical units. We use centrality measures to weigh texts and classify documents. We analyze 1,650 job announcements, published from January 1st, 2010 to April 5th, 2011 by 496 companies on DB SOUL (System University Orientation and Job).

Text clustering based on centrality measures: an application on job advertisements / Domenica Fioredistella, I., Mastrangelo, M., Scipione, S.. - STAMPA. - (2012), pp. 515-524. (11th International Conference on Textual Data Statistical Analysis Liegi 13-15 giugno 2012).

Text clustering based on centrality measures: an application on job advertisements

Domenica Fioredistella Iezzi;MASTRANGELO, MARIO;Scipione Sarlo

2012

Abstract

Text clustering methods allow automatic classification of a large set of documents. Many algorithms can be applied using the proposed methods for structured data. However, the corpus, once transformed from unstructured information into structured data, presents a high dimensionality and an overlapping of the clusters that could jeopardize understandability of the cluster description. In this paper, we introduce a new method of detecting centroids of clusters. Centroids represent prototypes of mutually exclusive partitions, and they can therefore facilitate interpretation of the results to describe groups. In this approach, after the preprocessing step, we establish links between documents by using co-occurrence information, within some lexical units. We use centrality measures to weigh texts and classify documents. We analyze 1,650 job announcements, published from January 1st, 2010 to April 5th, 2011 by 496 companies on DB SOUL (System University Orientation and Job).

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2012
			
	Nome convegno
	
				11th International Conference on Textual Data Statistical Analysis
			
	Parole chiave
	
				centroid; text clustering; centrality measures
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Text clustering based on centrality measures: an application on job advertisements / Domenica Fioredistella, I., Mastrangelo, M., Scipione, S.. - STAMPA. - (2012), pp. 515-524. (11th International Conference on Textual Data Statistical Analysis Liegi 13-15 giugno 2012).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/511909

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact