Variable selection criteria in a linear regression model are analyzed, considering consistency and efficiency properties. A Cross-Validation method, which we called Repeated Weigthed Hold-Out, is introduced. This method estimates the Prediction Error, as the well known Repeated Hold-Out, with the difference that, at each iteration, the units extracted in the training-set reduce their probability of a new extraction. This weighting makesthe procedure similar to a repeated K-fold Cross-Validation, in which the units appear with the same frequency in the training sets, obtaining a procedure more flexible than the K-fold CV. It is known that a criterion based exclusively on the estimate of PE is not consistent but efficient, however, using a property of CV, i.e. small training-sets introduce large penalty of the criterion while large training-set introduce small penalty, it is possible to obtain an estimator of the PE inherently penalized. This method dominates several criteria for variable selection, choosing theappropriate training-set size.

Repeated weigthed hold-out for variable selection in linear regression / DI CIACCIO, Agostino. - STAMPA. - 1:(2012), pp. 135-135. ( 5th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computing & Statistics (ERCIM 2012) Oviedo (Spain) 1-3 dicembre).

Repeated weigthed hold-out for variable selection in linear regression

DI CIACCIO, AGOSTINO
2012

Abstract

Variable selection criteria in a linear regression model are analyzed, considering consistency and efficiency properties. A Cross-Validation method, which we called Repeated Weigthed Hold-Out, is introduced. This method estimates the Prediction Error, as the well known Repeated Hold-Out, with the difference that, at each iteration, the units extracted in the training-set reduce their probability of a new extraction. This weighting makesthe procedure similar to a repeated K-fold Cross-Validation, in which the units appear with the same frequency in the training sets, obtaining a procedure more flexible than the K-fold CV. It is known that a criterion based exclusively on the estimate of PE is not consistent but efficient, however, using a property of CV, i.e. small training-sets introduce large penalty of the criterion while large training-set introduce small penalty, it is possible to obtain an estimator of the PE inherently penalized. This method dominates several criteria for variable selection, choosing theappropriate training-set size.
2012
9788493782221
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/507328
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact