Document clustering is a process of partitioning a corpus of documents into distinctive clusters based on the content similarity. Traditional (hard or fuzzy) document clustering algorithms are usually relying on the vector representation of documents based on the bag-of-words (BOW) approach, leading to very high dimensions in the vector representation of the corpus. In recent years, spectral clustering has been extensively applied in the field of text classification with support vector machines (SVMs) in combination with string kernels, but little has been done in the field of fuzzy document clustering with kernel-based methods. This work proposes a novel approach to text clustering, by grouping documents into clusters based on a new version of fuzzy spectral clustering with string kernels.

A Fuzzy clustering approach for textual data / Cozzolino, Irene; Ferraro, Maria Brigida; Winker, Peter. - (2021), pp. 770-776. ((Intervento presentato al convegno 51th Meeting of the Italian Statistical Society tenutosi a Pisa.

A Fuzzy clustering approach for textual data

Irene Cozzolino
;
Maria Brigida Ferraro;
2021

Abstract

Document clustering is a process of partitioning a corpus of documents into distinctive clusters based on the content similarity. Traditional (hard or fuzzy) document clustering algorithms are usually relying on the vector representation of documents based on the bag-of-words (BOW) approach, leading to very high dimensions in the vector representation of the corpus. In recent years, spectral clustering has been extensively applied in the field of text classification with support vector machines (SVMs) in combination with string kernels, but little has been done in the field of fuzzy document clustering with kernel-based methods. This work proposes a novel approach to text clustering, by grouping documents into clusters based on a new version of fuzzy spectral clustering with string kernels.
9788891927361
File allegati a questo prodotto
File Dimensione Formato  
Cozzolino_Fuzzy-clustering-approach_2021.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 37.4 MB
Formato Adobe PDF
37.4 MB Adobe PDF Visualizza/Apri PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1575783
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact