Analyzing categorical data in machine learning generally requires a coding strategy. This problem is common to multivariate statistical techniques and several approaches have been suggested in the literature. This article proposes a method for analyzing categorical variables with neural networks. Both a supervised and unsupervised approach were considered, in which the variables can have high cardinality. Some simulated data applications illustrate the interest of the proposal.
Optimal Coding of Categorical Data in Machine Learning / DI CIACCIO, Agostino. - (2023). - STUDIES IN CLASSIFICATION, DATA ANALYSIS, AND KNOWLEDGE ORGANIZATION.
Optimal Coding of Categorical Data in Machine Learning
Agostino Di Ciaccio
Primo
2023
Abstract
Analyzing categorical data in machine learning generally requires a coding strategy. This problem is common to multivariate statistical techniques and several approaches have been suggested in the literature. This article proposes a method for analyzing categorical variables with neural networks. Both a supervised and unsupervised approach were considered, in which the variables can have high cardinality. Some simulated data applications illustrate the interest of the proposal.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


