The aim of this paper is to present an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification. Some of the most robust techniques for supervised classification are those based on classification trees that make no assumptions on the parametric distribution of the data and are based on recursively partitioning the feature space into homogeneous subsets of units according to the class the entities belong to. One of the shortcomings of these approaches is that the recursive partitioning of the data, i.e. the growing of the tree, is achieved considering only one variable at a time and, although it makes the tree pretty simple in terms of determining rules to obtain a classification, it also makes them hard to interpret. Building on these ideas we carry the integration of parametric model into trees one step further and propose a supervised classification algorithm based on the k-means routine that sequentially splits the data according to the whole feature vector. Results from applications to simulated data are shown to address the potentiality of the proposed method in different conditions.
Supervised Nested Algorithm for Classification based on K-means / Nieddu, L.; Vicari, Donatella. - (2020), pp. 79-88. - STUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION. [10.1007/978-981-15-3311-2].
Supervised Nested Algorithm for Classification based on K-means
L. Nieddu
;Donatella Vicari
2020
Abstract
The aim of this paper is to present an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification. Some of the most robust techniques for supervised classification are those based on classification trees that make no assumptions on the parametric distribution of the data and are based on recursively partitioning the feature space into homogeneous subsets of units according to the class the entities belong to. One of the shortcomings of these approaches is that the recursive partitioning of the data, i.e. the growing of the tree, is achieved considering only one variable at a time and, although it makes the tree pretty simple in terms of determining rules to obtain a classification, it also makes them hard to interpret. Building on these ideas we carry the integration of parametric model into trees one step further and propose a supervised classification algorithm based on the k-means routine that sequentially splits the data according to the whole feature vector. Results from applications to simulated data are shown to address the potentiality of the proposed method in different conditions.File | Dimensione | Formato | |
---|---|---|---|
Nieddu_Supervised-nested-algorithm_2020.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
6.4 MB
Formato
Adobe PDF
|
6.4 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.