A new fuzzy document clustering algorithm based on topic homogeneity is introduced. In detail, a novel dissimilarity measure is proposed, derived from the $p$-value of a hypothesis test that assesses the homogeneity of topic distributions between two documents. First, the topic distributions are derived through Latent Dirichlet Allocation, and then a bootstrap procedure is applied to obtain the $p$-value. Finally, the resulting dissimilarity matrix is integrated into the fuzzy relational clustering procedure. The performance of the proposal is evaluated using a benchmark dataset.

Topic Homogeneity Test-Based Fuzzy Document Clustering / Sangiovanni, Gian Mario; Kontoghiorghes, Louisa; Colubi, Ana; Ferraro, Maria Brigida. - (2025), pp. 294-303. - STUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION. [10.1007/978-3-032-03042-9].

Topic Homogeneity Test-Based Fuzzy Document Clustering

Gian Mario Sangiovanni
Primo
;
Ana Colubi;Maria Brigida Ferraro
Ultimo
2025

Abstract

A new fuzzy document clustering algorithm based on topic homogeneity is introduced. In detail, a novel dissimilarity measure is proposed, derived from the $p$-value of a hypothesis test that assesses the homogeneity of topic distributions between two documents. First, the topic distributions are derived through Latent Dirichlet Allocation, and then a bootstrap procedure is applied to obtain the $p$-value. Finally, the resulting dissimilarity matrix is integrated into the fuzzy relational clustering procedure. The performance of the proposal is evaluated using a benchmark dataset.
2025
Supervised and Unsupervised Statistical Data Analysis
978-3-032-03041-2
978-3-032-03042-9
document clustering; topic homogeneity test; bootstrap; latent Dirichlet allocation
02 Pubblicazione su volume::02a Capitolo o Articolo
Topic Homogeneity Test-Based Fuzzy Document Clustering / Sangiovanni, Gian Mario; Kontoghiorghes, Louisa; Colubi, Ana; Ferraro, Maria Brigida. - (2025), pp. 294-303. - STUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION. [10.1007/978-3-032-03042-9].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1745394
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact