Text clustering methods allow automatic classification of a large set of documents. Many algorithms can be applied using the proposed methods for structured data. However, the corpus, once transformed from unstructured information into structured data, presents a high dimensionality and an overlapping of the clusters that could jeopardize understandability of the cluster description. In this paper, we introduce a new method of detecting centroids of clusters. Centroids represent prototypes of mutually exclusive partitions, and they can therefore facilitate interpretation of the results to describe groups. In this approach, after the preprocessing step, we establish links between documents by using co-occurrence information, within some lexical units. We use centrality measures to weigh texts and classify documents. We analyze 1,650 job announcements, published from January 1st, 2010 to April 5th, 2011 by 496 companies on DB SOUL (System University Orientation and Job).

Text clustering based on centrality measures: an application on job advertisements / Domenica Fioredistella, Iezzi; Mastrangelo, Mario; Scipione, Sarlo. - STAMPA. - (2012), pp. 515-524. (Intervento presentato al convegno 11th International Conference on Textual Data Statistical Analysis tenutosi a Liegi nel 13-15 giugno 2012).

Text clustering based on centrality measures: an application on job advertisements

MASTRANGELO, MARIO;
2012

Abstract

Text clustering methods allow automatic classification of a large set of documents. Many algorithms can be applied using the proposed methods for structured data. However, the corpus, once transformed from unstructured information into structured data, presents a high dimensionality and an overlapping of the clusters that could jeopardize understandability of the cluster description. In this paper, we introduce a new method of detecting centroids of clusters. Centroids represent prototypes of mutually exclusive partitions, and they can therefore facilitate interpretation of the results to describe groups. In this approach, after the preprocessing step, we establish links between documents by using co-occurrence information, within some lexical units. We use centrality measures to weigh texts and classify documents. We analyze 1,650 job announcements, published from January 1st, 2010 to April 5th, 2011 by 496 companies on DB SOUL (System University Orientation and Job).
2012
11th International Conference on Textual Data Statistical Analysis
centroid; text clustering; centrality measures
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Text clustering based on centrality measures: an application on job advertisements / Domenica Fioredistella, Iezzi; Mastrangelo, Mario; Scipione, Sarlo. - STAMPA. - (2012), pp. 515-524. (Intervento presentato al convegno 11th International Conference on Textual Data Statistical Analysis tenutosi a Liegi nel 13-15 giugno 2012).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/511909
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact