In clusterwise regression analysis, the goal is to predict a response variable based on a set of explanatory variables, each with cluster-specific effects. In many real-life problems, the number of candidate predictors is typically large, with perhaps only a few of them meaningfully contributing to the prediction. A well-known method to perform variable selection is the LASSO, with calibration done by minimizing the Bayesian Information Criterion (BIC). However, existing LASSO-penalized estimators are problematic for several reasons. First, only certain types of penalties are considered. Second, the computations may sometimes involve approximate schemes. Third, variable selection is usually time consuming, due to a complex calibration of the penalty term, possibly requiring several multiple evaluations of an estimator for each plausible value of the tuning parameter(s). We introduce a two-step approach to fill these gaps. In step 1, we fit LASSO clusterwise linear regressions with some pre-specified level of penalization (Fit step). In step 2 (Selection step), we perform covariate selection locally, i.e. on the weighted data, with weights corresponding to the posterior probabilities from the previous step. This is done by using a generalization of the Least Angle Regression (LARS) algorithm, which permits covariate selection with a single evaluation of the estimator. In addition, both Fit and Selection steps leverage on an Expectation Maximization (EM) algorithm, fully in closed forms, designed with a very general version of the LASSO penalty. The advantages of our proposal, in terms of computation time reduction, and accuracy of model estimation and selection, are shown by means of a simulation study, and illustrated with a real data application.
LASSO-penalized clusterwise linear regression modelling: a two-step approach / Di Mari, R; Rocci, R; Gattone, Sa. - In: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION. - ISSN 0094-9655. - (2023), pp. 1-24. [10.1080/00949655.2023.2220058]
LASSO-penalized clusterwise linear regression modelling: a two-step approach
Rocci, R;
2023
Abstract
In clusterwise regression analysis, the goal is to predict a response variable based on a set of explanatory variables, each with cluster-specific effects. In many real-life problems, the number of candidate predictors is typically large, with perhaps only a few of them meaningfully contributing to the prediction. A well-known method to perform variable selection is the LASSO, with calibration done by minimizing the Bayesian Information Criterion (BIC). However, existing LASSO-penalized estimators are problematic for several reasons. First, only certain types of penalties are considered. Second, the computations may sometimes involve approximate schemes. Third, variable selection is usually time consuming, due to a complex calibration of the penalty term, possibly requiring several multiple evaluations of an estimator for each plausible value of the tuning parameter(s). We introduce a two-step approach to fill these gaps. In step 1, we fit LASSO clusterwise linear regressions with some pre-specified level of penalization (Fit step). In step 2 (Selection step), we perform covariate selection locally, i.e. on the weighted data, with weights corresponding to the posterior probabilities from the previous step. This is done by using a generalization of the Least Angle Regression (LARS) algorithm, which permits covariate selection with a single evaluation of the estimator. In addition, both Fit and Selection steps leverage on an Expectation Maximization (EM) algorithm, fully in closed forms, designed with a very general version of the LASSO penalty. The advantages of our proposal, in terms of computation time reduction, and accuracy of model estimation and selection, are shown by means of a simulation study, and illustrated with a real data application.File | Dimensione | Formato | |
---|---|---|---|
Di Mari_LASSO–penalized-clusterwise_2023.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
2.13 MB
Formato
Adobe PDF
|
2.13 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.