Conditional Trees (CTrees) are a special case of recursive binary partitioning models where statistical tests are used in order to achieve unbiased variable selection and to solve the overfitting problem, and that, so far, have been applied only to either nominal or numeric variables. We propose an extension to the case of mixed-type data, where covariates may include functional data, graphs, and persistence diagrams. The testing procedures that in CTrees characterize both variabile selection and stopping criterion are here performed by means of energy statistics. Energy statistics allow to compare variables that need not to be defined on the same space, thus permitting to simultaneously model mixed-type covariates. This means that the resulting Energy Trees (ETrees) are a general model which can be applied to a number of cases where other models are not viable, with the additional advantage of being strongly based on statistical testing procedures. The results obtained in both simulated scenarios and real-case analyses are promising, and definitely foster further explorations in the area.
ETrees: A Generalization of Conditional Trees to Mixed-Type Data / Giubilei, Riccardo; Padellini, Tullia; Brutti, Pierpaolo. - (2019). (Intervento presentato al convegno 32nd Edition of the European Meeting of Statisticians tenutosi a Palermo).
ETrees: A Generalization of Conditional Trees to Mixed-Type Data
Riccardo Giubilei
Primo
;Pierpaolo BruttiUltimo
2019
Abstract
Conditional Trees (CTrees) are a special case of recursive binary partitioning models where statistical tests are used in order to achieve unbiased variable selection and to solve the overfitting problem, and that, so far, have been applied only to either nominal or numeric variables. We propose an extension to the case of mixed-type data, where covariates may include functional data, graphs, and persistence diagrams. The testing procedures that in CTrees characterize both variabile selection and stopping criterion are here performed by means of energy statistics. Energy statistics allow to compare variables that need not to be defined on the same space, thus permitting to simultaneously model mixed-type covariates. This means that the resulting Energy Trees (ETrees) are a general model which can be applied to a number of cases where other models are not viable, with the additional advantage of being strongly based on statistical testing procedures. The results obtained in both simulated scenarios and real-case analyses are promising, and definitely foster further explorations in the area.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.