The primary objective of simultaneous methodologies for clustering and variable reduction is to identify both the optimal partition of units and the optimal subspace of variables, all at once. The optimality is typically determined using least squares or maximum likelihood estimation methods. These simultaneous techniques are particularly useful when working with Big Data, where the reduction (synthesis) is essential for both units and variables. Furthermore, a secondary objective of reducing variables through a subspace is to enhance the interpretability of the latent variables identified by the subspace using specific methodologies. The drclust package implements double K-means (KM), reduced KM, and factorial KM to address the primary objective. KM with disjoint principal components addresses both the primary and secondary objectives, while disjoint principal component analysis and disjoint factor analysis address the latter, producing the sparsest loading matrix. The models are implemented in C++ for faster execution, processing large data matrices in a reasonable amount of time.

drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction / Prunila, Ionel; Vichi, Maurizio. - In: THE R JOURNAL. - ISSN 2073-4859. - 17:4(2026), pp. 103-132. [10.32614/RJ-2025-046]

drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction

Prunila Ionel
Writing – Original Draft Preparation
;
Vichi Maurizio
Supervision
2026

Abstract

The primary objective of simultaneous methodologies for clustering and variable reduction is to identify both the optimal partition of units and the optimal subspace of variables, all at once. The optimality is typically determined using least squares or maximum likelihood estimation methods. These simultaneous techniques are particularly useful when working with Big Data, where the reduction (synthesis) is essential for both units and variables. Furthermore, a secondary objective of reducing variables through a subspace is to enhance the interpretability of the latent variables identified by the subspace using specific methodologies. The drclust package implements double K-means (KM), reduced KM, and factorial KM to address the primary objective. KM with disjoint principal components addresses both the primary and secondary objectives, while disjoint principal component analysis and disjoint factor analysis address the latter, producing the sparsest loading matrix. The models are implemented in C++ for faster execution, processing large data matrices in a reasonable amount of time.
2026
clustering; principal components; factors, ALS; disjoint components
01 Pubblicazione su rivista::01a Articolo in rivista
drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction / Prunila, Ionel; Vichi, Maurizio. - In: THE R JOURNAL. - ISSN 2073-4859. - 17:4(2026), pp. 103-132. [10.32614/RJ-2025-046]
File allegati a questo prodotto
File Dimensione Formato  
Prunila_drclust_2025.pdf

solo gestori archivio

Note: drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.59 MB
Formato Adobe PDF
1.59 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1760814
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact