Text clustering is a well-known method for information retrieval and numerous methods for classifying words, documents or both together have been proposed. Frequently, textual data are encoded using vector models so the corpus is transformed in to a matrix of terms by documents; using this representation text clustering generates groups of similar objects on the basis of the presence/absence of the words in the documents. An alternative way to work on texts is to represent them as a network where nodes are entities connected by the presence and distribution of the words in the documents. In this work, after summarising the state of the art of text clustering we will present a new network approach to textual data. We undertake text co-clustering using methods developed for social network analysis. Several experimental results will be presented to demonstrate the validity of the approach and the advantages of this technique compared to existing methods.

Network text analysis: A two-way classification approach / Celardo, L.; Everett, M. G.. - In: INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT. - ISSN 0268-4012. - 51:(2020). [10.1016/j.ijinfomgt.2019.09.005]

Network text analysis: A two-way classification approach

Celardo L.;
2020

Abstract

Text clustering is a well-known method for information retrieval and numerous methods for classifying words, documents or both together have been proposed. Frequently, textual data are encoded using vector models so the corpus is transformed in to a matrix of terms by documents; using this representation text clustering generates groups of similar objects on the basis of the presence/absence of the words in the documents. An alternative way to work on texts is to represent them as a network where nodes are entities connected by the presence and distribution of the words in the documents. In this work, after summarising the state of the art of text clustering we will present a new network approach to textual data. We undertake text co-clustering using methods developed for social network analysis. Several experimental results will be presented to demonstrate the validity of the approach and the advantages of this technique compared to existing methods.
2020
Co-clustering; Network text analysis; Text mining
01 Pubblicazione su rivista::01a Articolo in rivista
Network text analysis: A two-way classification approach / Celardo, L.; Everett, M. G.. - In: INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT. - ISSN 0268-4012. - 51:(2020). [10.1016/j.ijinfomgt.2019.09.005]
File allegati a questo prodotto
File Dimensione Formato  
Celardo, Everett (2020).pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.66 MB
Formato Adobe PDF
1.66 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1384963
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 14
social impact