The development of new technologies and methods of data collection produces the necessity to summarise the large quantity of information that is available. Usually, we face a data matrix X of size (n x J), corresponding to n statistical units and J quantitative variables, where n and J are very large. Clustering is the analysis which identifies homogeneous clusters of units, thus it might be meant as a way to reduce their dimension. Dimensionality reduction techniques are methods to obtain latent dimensions (less than manifest variables), so they reduce the dimensionality of the variables space. In this paper, we apply Double Hierarchical Parsimonious Means Clustering (Cavicchia et al., 2019) in order to get a simultaneous hierarchical parsimonious clustering of units - aggregated around centroids - and dimensionality reduction of variables - aggregated around components - on Asia-Europe Meeting (ASEM) data set. The model is estimated by using the LS method and an efficient coordinate descent algorithm is given. The goodness of fit of the double hierarchical parsimonious trees can be computed to assess the quality of the two hierarchical partitions.

Hierarchical clustering and dimensionality reduction for big data / Cavicchia, Carlo; Vichi, Maurizio; Zaccaria, Giorgia. - (2019), pp. 173-180. (Intervento presentato al convegno SIS 2019 tenutosi a Milano, Italia).

Hierarchical clustering and dimensionality reduction for big data

Carlo Cavicchia;Maurizio Vichi;Giorgia Zaccaria
2019

Abstract

The development of new technologies and methods of data collection produces the necessity to summarise the large quantity of information that is available. Usually, we face a data matrix X of size (n x J), corresponding to n statistical units and J quantitative variables, where n and J are very large. Clustering is the analysis which identifies homogeneous clusters of units, thus it might be meant as a way to reduce their dimension. Dimensionality reduction techniques are methods to obtain latent dimensions (less than manifest variables), so they reduce the dimensionality of the variables space. In this paper, we apply Double Hierarchical Parsimonious Means Clustering (Cavicchia et al., 2019) in order to get a simultaneous hierarchical parsimonious clustering of units - aggregated around centroids - and dimensionality reduction of variables - aggregated around components - on Asia-Europe Meeting (ASEM) data set. The model is estimated by using the LS method and an efficient coordinate descent algorithm is given. The goodness of fit of the double hierarchical parsimonious trees can be computed to assess the quality of the two hierarchical partitions.
2019
SIS 2019
clustering; dimensionality reduction; big data; hierarchy
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Hierarchical clustering and dimensionality reduction for big data / Cavicchia, Carlo; Vichi, Maurizio; Zaccaria, Giorgia. - (2019), pp. 173-180. (Intervento presentato al convegno SIS 2019 tenutosi a Milano, Italia).
File allegati a questo prodotto
File Dimensione Formato  
Cavicchia_Hierarchical-Clustering_2019.pdf.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 703.74 kB
Formato Adobe PDF
703.74 kB Adobe PDF
Cavicchia_Indice_Hierarchical-Clustering_2019.pdf

accesso aperto

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 5.61 MB
Formato Adobe PDF
5.61 MB Adobe PDF
Cavicchia_Quarta_Hierarchical-Clustering_2019.pdf

accesso aperto

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 145.46 kB
Formato Adobe PDF
145.46 kB Adobe PDF
Cavicchia_Frontespizio_Hierarchical-Clustering_2019.pdf

accesso aperto

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 305.06 kB
Formato Adobe PDF
305.06 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1292036
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact