Partitive clustering methods represent one of the earlier and most famous sets of strategy in the field of clustering. The name comes from their main feature: all these methods start from an initial partition and modify it at every step of the process according to a known criterion, until a given convergence rule is satisfied. In other words, as pointed out by Äyrämö and Kärkkäinen (2006), they work essentially as iterative allocation algorithms. In this framework, we do not only focus on “canonical” approaches such as K-means and fuzzy C-means, but discuss some recent symmetrybased partitive clustering methods, mostly developed in the context of computer science and engineering. As it will be shown, these approaches seem to provide encouraging results, especially in the field of image recognition and some related applications, and for this reason, they represent a starting point for our work. In this respect, we are particularly interested in the case of overlapping clusters. As we will clarify, this case may represent a critical aspect for most clustering methods we have considered. In particular, we started our analysis by noting that, in a case of high-dimensional data with overlapping clusters, it may be difficult to choose the component-specific distributions, and no graphical device can help us. So, we decided to investigate non parametric approaches to clustering. In this framework, we focused on the case of clusters with elliptical shapes, and in Gaussian mixtures as a special case. Then, we realized that for elliptical shapes the symmetry could be a “natural” choice. So, we searched for such clustering approaches, and we found the symmetrybased methods cited above. But, surprisingly, none of them was intended to focus on elliptical clusters, since their aim is essentially at handling image recognition of different symmetric shapes. So, we decided to discuss this issue, and to test whether a suitable function of symmetry could improve clustering results in the case of elliptical overlapping clusters. Since we are interested in elliptical shapes, from a clustering point of view, another broad subject that we will discuss is the Gaussian mixture model. In this context, our interest is in the EM-based Mclust algorithm from the R library mclust, see Fraley and Raftery (1999). Thus, our work address both of these topics, partitive clustering methods (with a focus on the symmetry-based approach) and Gaussian model-based clustering. The main reason of such a choice, that is to address two partially different subjects, derives from the essential features of our proposal: a symmetry-based partitive method which is intended to deal with elliptical clusters (with Gaussian being a special case). In this sense, we provide an evaluation of our clustering performances by proposing a comparison with the Gaussian mixture model implemented in the Mclust library, see Fraley and Raftery (1999). This is surely a challenging task, since this method has home-court advantage in the case of Gaussian clusters. In this framework, as pointed out before, we are mainly interested in the case of overlapping clusters. In this sense, a starting point for our work was the assumption that Mclust (also in its “natural” framework, that is Gaussian mixtures) could have problems in centroid estimation when clusters are highly overlapping. Quite obviously, this drawback could be related to its dependency on the mutivariate Gaussian density. So, we searched for a non parametric skewness-based method, which could be appropriate for elliptical distribution (including Gaussian) in the case of overlapping clusters. This was exactly the framework of the proposed Sbam (Skewness-Based Allocation Method) algorithm.

A skewness-based clustering method / Acquafredda, Luca. - (2018 Feb 26).

A skewness-based clustering method

ACQUAFREDDA, LUCA
26/02/2018

Abstract

Partitive clustering methods represent one of the earlier and most famous sets of strategy in the field of clustering. The name comes from their main feature: all these methods start from an initial partition and modify it at every step of the process according to a known criterion, until a given convergence rule is satisfied. In other words, as pointed out by Äyrämö and Kärkkäinen (2006), they work essentially as iterative allocation algorithms. In this framework, we do not only focus on “canonical” approaches such as K-means and fuzzy C-means, but discuss some recent symmetrybased partitive clustering methods, mostly developed in the context of computer science and engineering. As it will be shown, these approaches seem to provide encouraging results, especially in the field of image recognition and some related applications, and for this reason, they represent a starting point for our work. In this respect, we are particularly interested in the case of overlapping clusters. As we will clarify, this case may represent a critical aspect for most clustering methods we have considered. In particular, we started our analysis by noting that, in a case of high-dimensional data with overlapping clusters, it may be difficult to choose the component-specific distributions, and no graphical device can help us. So, we decided to investigate non parametric approaches to clustering. In this framework, we focused on the case of clusters with elliptical shapes, and in Gaussian mixtures as a special case. Then, we realized that for elliptical shapes the symmetry could be a “natural” choice. So, we searched for such clustering approaches, and we found the symmetrybased methods cited above. But, surprisingly, none of them was intended to focus on elliptical clusters, since their aim is essentially at handling image recognition of different symmetric shapes. So, we decided to discuss this issue, and to test whether a suitable function of symmetry could improve clustering results in the case of elliptical overlapping clusters. Since we are interested in elliptical shapes, from a clustering point of view, another broad subject that we will discuss is the Gaussian mixture model. In this context, our interest is in the EM-based Mclust algorithm from the R library mclust, see Fraley and Raftery (1999). Thus, our work address both of these topics, partitive clustering methods (with a focus on the symmetry-based approach) and Gaussian model-based clustering. The main reason of such a choice, that is to address two partially different subjects, derives from the essential features of our proposal: a symmetry-based partitive method which is intended to deal with elliptical clusters (with Gaussian being a special case). In this sense, we provide an evaluation of our clustering performances by proposing a comparison with the Gaussian mixture model implemented in the Mclust library, see Fraley and Raftery (1999). This is surely a challenging task, since this method has home-court advantage in the case of Gaussian clusters. In this framework, as pointed out before, we are mainly interested in the case of overlapping clusters. In this sense, a starting point for our work was the assumption that Mclust (also in its “natural” framework, that is Gaussian mixtures) could have problems in centroid estimation when clusters are highly overlapping. Quite obviously, this drawback could be related to its dependency on the mutivariate Gaussian density. So, we searched for a non parametric skewness-based method, which could be appropriate for elliptical distribution (including Gaussian) in the case of overlapping clusters. This was exactly the framework of the proposed Sbam (Skewness-Based Allocation Method) algorithm.
26-feb-2018
File allegati a questo prodotto
File Dimensione Formato  
Tesi dottorato Acquafredda

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 1.87 MB
Formato Adobe PDF
1.87 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1086091
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact