In prediction problems both response and covariates may have high correlation with a second group of influential regressors, that can be considered as background variables. An important challenge is to perform variable selection and importance assessment among the covariates in the presence of these variables. A clinical example is the prediction of the lean body mass (response) from bioimpedance (covariates), where anthropometric measures play the role of background variables. We introduce a reduced dataset in which the variables are defined as the residuals with respect to the background, and perform variable selection and importance assessment both in linear and random forest models. Using a clinical dataset of multifrequency bioimpedance, we show the effectiveness of this method to select the most relevant predictors of the lean body mass beyond anthropometry.

Variable selection and importance in presence of high collinearity: an application to the prediction of lean body mass from multi-frequency bioelectrical impedance / Cammarota, Camillo; Pinto, Alessandro. - In: JOURNAL OF APPLIED STATISTICS. - ISSN 1360-0532. - (2021). [10.1080/02664763.2020.1763930]

Variable selection and importance in presence of high collinearity: an application to the prediction of lean body mass from multi-frequency bioelectrical impedance

Camillo Cammarota;Alessandro Pinto
2021

Abstract

In prediction problems both response and covariates may have high correlation with a second group of influential regressors, that can be considered as background variables. An important challenge is to perform variable selection and importance assessment among the covariates in the presence of these variables. A clinical example is the prediction of the lean body mass (response) from bioimpedance (covariates), where anthropometric measures play the role of background variables. We introduce a reduced dataset in which the variables are defined as the residuals with respect to the background, and perform variable selection and importance assessment both in linear and random forest models. Using a clinical dataset of multifrequency bioimpedance, we show the effectiveness of this method to select the most relevant predictors of the lean body mass beyond anthropometry.
2021
Variable selection; importance; linear model; random forests; bioimpedance; multi-frequency; anthropometric variables; lean body mass
01 Pubblicazione su rivista::01a Articolo in rivista
Variable selection and importance in presence of high collinearity: an application to the prediction of lean body mass from multi-frequency bioelectrical impedance / Cammarota, Camillo; Pinto, Alessandro. - In: JOURNAL OF APPLIED STATISTICS. - ISSN 1360-0532. - (2021). [10.1080/02664763.2020.1763930]
File allegati a questo prodotto
File Dimensione Formato  
Cammarota_postprint_Variable-selection_2020.pdf

solo gestori archivio

Note: https://www.tandfonline.com/doi/full/10.1080/02664763.2020.1763930
Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 379.14 kB
Formato Adobe PDF
379.14 kB Adobe PDF   Contatta l'autore
Cammarota_Variable-selection_2021.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.89 MB
Formato Adobe PDF
1.89 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1394665
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 6
social impact