We propose a new method for the simultaneous reduction of units and variables in a data matrix. Reduced K-Means (RKM) and Factorial K-Means (FKM) are two well-know techniques used in this context. Both techniques involve principal component analysis and K-means but they work in a different way. On the one hand, RKM maximizes the between-clusters deviance without imposing any condition on the within-clusters deviance. On the other hand, FKM minimizes the within-clusters deviance without imposing any condition on the between one. Hence, RKM and FKM give different results: the partition obtained by RKM may contain isolated but heterogeneous clusters while the one obtained by FKM may include homogeneous but not isolated clusters. FKM can be used when RKM fails, and vice versa. For this reason we propose to combine the two techniques in a general model through a linear convex combination. In doing so, we approach the problem in a fuzzy framework. We investigate the adequacy of the proposal by means of simulation and real case studies.
A general method for clustering in a reduced subspace / Ferraro, MARIA BRIGIDA; Giordani, Paolo; Vichi, Maurizio. - (2015), pp. 48-48. (Intervento presentato al convegno The 2015 International Meeting of the Psychometric Society tenutosi a Beijing (China) nel 11-16 July 2015).
A general method for clustering in a reduced subspace
FERRARO, MARIA BRIGIDA;GIORDANI, Paolo;VICHI, Maurizio
2015
Abstract
We propose a new method for the simultaneous reduction of units and variables in a data matrix. Reduced K-Means (RKM) and Factorial K-Means (FKM) are two well-know techniques used in this context. Both techniques involve principal component analysis and K-means but they work in a different way. On the one hand, RKM maximizes the between-clusters deviance without imposing any condition on the within-clusters deviance. On the other hand, FKM minimizes the within-clusters deviance without imposing any condition on the between one. Hence, RKM and FKM give different results: the partition obtained by RKM may contain isolated but heterogeneous clusters while the one obtained by FKM may include homogeneous but not isolated clusters. FKM can be used when RKM fails, and vice versa. For this reason we propose to combine the two techniques in a general model through a linear convex combination. In doing so, we approach the problem in a fuzzy framework. We investigate the adequacy of the proposal by means of simulation and real case studies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.