Clustering is one of the most used tools in data analysis. In the last decades, due to the increasing complexity of data, soft clustering has received a great deal of attention. There exist different approaches that can be considered as soft. The most known is the fuzzy approach that consists in assigning objects to clusters with membership degrees, depending on the dissimilarities between each object and all the prototypes, ranging in the unit interval. Closely related to the fuzzy approach, there is the possibilistic one that, differently from the previous one, relaxes some constraints on the membership degrees. In particular, the objects are assigned to clusters with degrees of typicalities, depending just on the dissimilarities between each object and the closest prototype. A further soft approach is the rough one. In this case, there are not degrees ranging between 0 and 1 but objects with intermediate features belong to the boundary region and are assigned to more than one cluster. Even if it is not universally recognized in the scientific community as an approach of soft clustering, from our point of view, the model-based approach can also be considered as such. Model-based clustering methods also produce a soft partition of the objects and the posterior probability of a component membership may play a role similar to the membership degree. The four approaches are critically described from a theoretical point of view and an empirical comparative analysis is carried out. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical and Graphical Methods of Data Analysis > Multivariate Analysis Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis.

Soft clustering / Ferraro, M. B.; Giordani, P.. - In: WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS. - ISSN 1939-0068. - (2020), pp. 1-12. [10.1002/wics.1480]

Soft clustering

Ferraro M. B.
;
Giordani P.
2020

Abstract

Clustering is one of the most used tools in data analysis. In the last decades, due to the increasing complexity of data, soft clustering has received a great deal of attention. There exist different approaches that can be considered as soft. The most known is the fuzzy approach that consists in assigning objects to clusters with membership degrees, depending on the dissimilarities between each object and all the prototypes, ranging in the unit interval. Closely related to the fuzzy approach, there is the possibilistic one that, differently from the previous one, relaxes some constraints on the membership degrees. In particular, the objects are assigned to clusters with degrees of typicalities, depending just on the dissimilarities between each object and the closest prototype. A further soft approach is the rough one. In this case, there are not degrees ranging between 0 and 1 but objects with intermediate features belong to the boundary region and are assigned to more than one cluster. Even if it is not universally recognized in the scientific community as an approach of soft clustering, from our point of view, the model-based approach can also be considered as such. Model-based clustering methods also produce a soft partition of the objects and the posterior probability of a component membership may play a role similar to the membership degree. The four approaches are critically described from a theoretical point of view and an empirical comparative analysis is carried out. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical and Graphical Methods of Data Analysis > Multivariate Analysis Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis.
2020
Fuzzy approach; model-based approach; possibilistic approach; rough approach; soft clustering
01 Pubblicazione su rivista::01a Articolo in rivista
Soft clustering / Ferraro, M. B.; Giordani, P.. - In: WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS. - ISSN 1939-0068. - (2020), pp. 1-12. [10.1002/wics.1480]
File allegati a questo prodotto
File Dimensione Formato  
Ferraro_Soft-clustering_2019.pdf

solo gestori archivio

Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.42 MB
Formato Adobe PDF
1.42 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1304557
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 11
social impact