Catalogo dei prodotti della ricerca

Nowadays, the explosive growth in text data emphasizes the need for developing new and computationally efficient methods and credible theoretical support tailored for analyzing such large-scale data. Given the vast amount of this kind of unstructured data, the majority of it is not classified, hence unsupervised learning techniques show to be useful in this field. Document clustering has proven to be an efficient tool in organizing textual documents and it has been widely applied in different areas from information retrieval to topic modeling. Before introducing the proposals of document clustering algorithms, the principal steps of the whole process, including the mathematical representation of documents and the preprocessing phase, are discussed. Then, the main clustering algorithms used for text data are critically analyzed, considering prototype-based, graph-based, hierarchical, and model-based approaches.

Document clustering / Cozzolino, I., Ferraro, M.B.. - In: WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS. - ISSN 1939-0068. - (2022), pp. -1. [10.1002/wics.1588]

Document clustering

Irene Cozzolino;Maria Brigida Ferraro

2022

Abstract

Nowadays, the explosive growth in text data emphasizes the need for developing new and computationally efficient methods and credible theoretical support tailored for analyzing such large-scale data. Given the vast amount of this kind of unstructured data, the majority of it is not classified, hence unsupervised learning techniques show to be useful in this field. Document clustering has proven to be an efficient tool in organizing textual documents and it has been widely applied in different areas from information retrieval to topic modeling. Before introducing the proposals of document clustering algorithms, the principal steps of the whole process, including the mathematical representation of documents and the preprocessing phase, are discussed. Then, the main clustering algorithms used for text data are critically analyzed, considering prototype-based, graph-based, hierarchical, and model-based approaches.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Parole chiave
	
				document clustering; document representation; graph-based methods;  hierarchical methods; model-based methods; prototype-based methods; text data
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Document clustering / Cozzolino, I., Ferraro, M.B.. - In: WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS. - ISSN 1939-0068. - (2022), pp. -1. [10.1002/wics.1588]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Cozzolino_Document-clustering_2022.pdf accesso aperto Note: Rivista presente nell'elenco di quelle su cui è possibile pubblicare ad accesso aperto per il CONTRATTO TRASFORMATIVO 2020-2023 con WILEY (https://web.uniroma1.it/sbs/agevolazioni-open-access-sapienza) Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.43 MB Formato Adobe PDF	1.43 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1641298

Citazioni

ND

16

12

social impact