A new technique for simultaneous clustering and dimensionality reduction of functional data is proposed. The observations are projected into a low-dimensional subspace and clustered by means of a functional K-means. The subspace and the partition are estimated simultaneously by minimizing the within deviance in the reduced space. This allows us to find new dimensions with a very low within deviance, which should correspond to a high level of discriminant power. However, in some cases, the total deviance explained by the new dimensions is so low as to make the subspace, and therefore the partition identified in it, insignificant. To overcome this drawback, we add to the loss a penalty equal to the negative total deviance in the reduced space. In this way, subspaces with a low deviance are avoided. We show how several existing methods are particular cases of our proposal simply by varying the weight of the penalty. The estimation is improved by adding a regularization term to the loss in order to take into account the functional nature of the data by smoothing the centroids. In contrast to existing literature, which largely considers the smoothing as a pre-processing step, in our proposal regularization is integrated with the identification of both subspace and cluster partition. An alternating least squares algorithm is introduced to compute model parameter estimates. The effectiveness of our proposal is demonstrated through its application to both real and simulated data. Supplementary materials for this article are available online.
Functional projection K -means / Rocci, Roberto; Gattone, Stefano A.. - In: JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS. - ISSN 1537-2715. - (2024). [10.1080/10618600.2024.2429706]
Functional projection K -means
Roberto Rocci;
2024
Abstract
A new technique for simultaneous clustering and dimensionality reduction of functional data is proposed. The observations are projected into a low-dimensional subspace and clustered by means of a functional K-means. The subspace and the partition are estimated simultaneously by minimizing the within deviance in the reduced space. This allows us to find new dimensions with a very low within deviance, which should correspond to a high level of discriminant power. However, in some cases, the total deviance explained by the new dimensions is so low as to make the subspace, and therefore the partition identified in it, insignificant. To overcome this drawback, we add to the loss a penalty equal to the negative total deviance in the reduced space. In this way, subspaces with a low deviance are avoided. We show how several existing methods are particular cases of our proposal simply by varying the weight of the penalty. The estimation is improved by adding a regularization term to the loss in order to take into account the functional nature of the data by smoothing the centroids. In contrast to existing literature, which largely considers the smoothing as a pre-processing step, in our proposal regularization is integrated with the identification of both subspace and cluster partition. An alternating least squares algorithm is introduced to compute model parameter estimates. The effectiveness of our proposal is demonstrated through its application to both real and simulated data. Supplementary materials for this article are available online.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.