The aim of this paper is to present an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification. Some of the most robust techniques for supervised classification are those based on classification trees that make no assumptions on the parametric distribution of the data and are based on recursively partitioning the feature space into homogeneous subsets of units according to the class the entities belong to. One of the shortcomings of these approaches is that the recursive partitioning of the data, i.e. the growing of the tree, is achieved considering only one variable at a time and, although it makes the tree pretty simple in terms of determining rules to obtain a classification, it also makes them hard to interpret. Building on these ideas we carry the integration of parametric model into trees one step further and propose a supervised classification algorithm based on the k-means routine that sequentially splits the data according to the whole feature vector. Results from applications to simulated data are shown to address the potentiality of the proposed method in different conditions.

Supervised Nested Algorithm for Classification based on K-means / Nieddu, L.; Vicari, Donatella. - (2020), pp. 79-88. - STUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION. [10.1007/978-981-15-3311-2].

Supervised Nested Algorithm for Classification based on K-means

L. Nieddu
;
Donatella Vicari
2020

Abstract

The aim of this paper is to present an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification. Some of the most robust techniques for supervised classification are those based on classification trees that make no assumptions on the parametric distribution of the data and are based on recursively partitioning the feature space into homogeneous subsets of units according to the class the entities belong to. One of the shortcomings of these approaches is that the recursive partitioning of the data, i.e. the growing of the tree, is achieved considering only one variable at a time and, although it makes the tree pretty simple in terms of determining rules to obtain a classification, it also makes them hard to interpret. Building on these ideas we carry the integration of parametric model into trees one step further and propose a supervised classification algorithm based on the k-means routine that sequentially splits the data according to the whole feature vector. Results from applications to simulated data are shown to address the potentiality of the proposed method in different conditions.
2020
Advanced Studies in Classification and Data Science
978-981-15-3310-5
978-981-15-3311-2
classification; imperfect supervisor; k-means.
02 Pubblicazione su volume::02a Capitolo o Articolo
Supervised Nested Algorithm for Classification based on K-means / Nieddu, L.; Vicari, Donatella. - (2020), pp. 79-88. - STUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION. [10.1007/978-981-15-3311-2].
File allegati a questo prodotto
File Dimensione Formato  
Nieddu_Supervised-nested-algorithm_2020.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 6.4 MB
Formato Adobe PDF
6.4 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1180314
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact