The primary objective of simultaneous methodologies for clustering and variable reduction is to identify both the optimal partition of units and the optimal subspace of variables, all at once. The optimality is typically determined using least squares or maximum likelihood estimation methods. These simultaneous techniques are particularly useful when working with Big Data, where the reduction (synthesis) is essential for both units and variables. Furthermore, a secondary objective of reducing variables through a subspace is to enhance the interpretability of the latent variables identified by the subspace using specific methodologies. The drclust package implements double K-means (KM), reduced KM, and factorial KM to address the primary objective. KM with disjoint principal components addresses both the primary and secondary objectives, while disjoint principal component analysis and disjoint factor analysis address the latter, producing the sparsest loading matrix. The models are implemented in C++ for faster execution, processing large data matrices in a reasonable amount of time.
drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction / Prunila, Ionel; Vichi, Maurizio. - In: THE R JOURNAL. - ISSN 2073-4859. - 17:4(2026), pp. 103-132. [10.32614/RJ-2025-046]
drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction
Prunila Ionel
Writing – Original Draft Preparation
;Vichi MaurizioSupervision
2026
Abstract
The primary objective of simultaneous methodologies for clustering and variable reduction is to identify both the optimal partition of units and the optimal subspace of variables, all at once. The optimality is typically determined using least squares or maximum likelihood estimation methods. These simultaneous techniques are particularly useful when working with Big Data, where the reduction (synthesis) is essential for both units and variables. Furthermore, a secondary objective of reducing variables through a subspace is to enhance the interpretability of the latent variables identified by the subspace using specific methodologies. The drclust package implements double K-means (KM), reduced KM, and factorial KM to address the primary objective. KM with disjoint principal components addresses both the primary and secondary objectives, while disjoint principal component analysis and disjoint factor analysis address the latter, producing the sparsest loading matrix. The models are implemented in C++ for faster execution, processing large data matrices in a reasonable amount of time.| File | Dimensione | Formato | |
|---|---|---|---|
|
Prunila_drclust_2025.pdf
solo gestori archivio
Note: drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.59 MB
Formato
Adobe PDF
|
1.59 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


