Document clustering is a process of partitioning a corpus of documents into distinctive clusters based on the content similarity. Traditional (hard or fuzzy) document clustering algorithms are usually relying on the vector representation of documents based on the bag-of-words (BOW) approach, leading to very high dimensions in the vector representation of the corpus. In recent years, spectral clustering has been extensively applied in the field of text classification with support vector machines (SVMs) in combination with string kernels, but little has been done in the field of fuzzy document clustering with kernel-based methods. This work proposes a novel approach to text clustering, by grouping documents into clusters based on a new version of fuzzy spectral clustering with string kernels.
A Fuzzy clustering approach for textual data / Cozzolino, Irene; Ferraro, MARIA BRIGIDA; Winker, Peter. - (2021), pp. 770-776. (Intervento presentato al convegno 51th Meeting of the Italian Statistical Society tenutosi a Pisa).
A Fuzzy clustering approach for textual data
Irene Cozzolino
;Maria Brigida Ferraro;
2021
Abstract
Document clustering is a process of partitioning a corpus of documents into distinctive clusters based on the content similarity. Traditional (hard or fuzzy) document clustering algorithms are usually relying on the vector representation of documents based on the bag-of-words (BOW) approach, leading to very high dimensions in the vector representation of the corpus. In recent years, spectral clustering has been extensively applied in the field of text classification with support vector machines (SVMs) in combination with string kernels, but little has been done in the field of fuzzy document clustering with kernel-based methods. This work proposes a novel approach to text clustering, by grouping documents into clusters based on a new version of fuzzy spectral clustering with string kernels.File | Dimensione | Formato | |
---|---|---|---|
Cozzolino_Fuzzy-clustering-approach_2021.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
37.4 MB
Formato
Adobe PDF
|
37.4 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.