There are several reasons why robust regression techniques are useful tools in sampling design. First of all, when stratified samples are considered, one needs to deal with three main issues: the sample size, the strata bounds determination and the sample allocation in the strata. Since the target variable $y$, objective of the survey, is unknown, it is used some auxiliary information $x$ known for the entire population from which the sample is drawn. Such information is helpful as it is strongly correlated with the target $y$, but of course some discrepancies between them may arise. The use of auxiliary information, combined with the choice of the appropriate statistical model to estimate the relationship with the variable of interest $y$, is crucial for the determination of the strata bounds, the size of the sample and the sampling rates according to a chosen precision level of the estimates, as it has been shown by Rivest (2002). Nevertheless, this regression-based approach is highly sensitive to the presence of contaminated data. Indeed, the influence of outlying observations in both $y$ and $x$ has an explosive impact on the variances with the effect of strong departures from the optimum sample allocation. Therefore, we expect increasing sample sizes in the strata, wrong allocation of sampling units in the strata and some errors in the strata bounds determination. Since the key tool for stratified sampling is the measure of scale of $y$ conditional to the knowledge of some auxiliary $x$, a robust approach based on $S-$estimator of regression is proposed in this paper. The aim is to allow for robust sample size and strata bounds determination, together with the optimal sample allocation. To show the advantages of the proposed method, an empirical illustration is provided for Belgian business surveys in the sector of Construction. It is considered a skewed population framework, which is typical for businesses, with a stratified design with one \emph{take-all} stratum and $L-1$ strata. Simulation results are also provided.

Response burden reduction through the use of administrative data and robust sampling / Bramati, Maria Caterina. - STAMPA. - 2:(2011), pp. 88-92. (Intervento presentato al convegno Società Italiana di Statistica 2011 tenutosi a Bologna nel Giugno 2011).

Response burden reduction through the use of administrative data and robust sampling

BRAMATI, Maria Caterina
2011

Abstract

There are several reasons why robust regression techniques are useful tools in sampling design. First of all, when stratified samples are considered, one needs to deal with three main issues: the sample size, the strata bounds determination and the sample allocation in the strata. Since the target variable $y$, objective of the survey, is unknown, it is used some auxiliary information $x$ known for the entire population from which the sample is drawn. Such information is helpful as it is strongly correlated with the target $y$, but of course some discrepancies between them may arise. The use of auxiliary information, combined with the choice of the appropriate statistical model to estimate the relationship with the variable of interest $y$, is crucial for the determination of the strata bounds, the size of the sample and the sampling rates according to a chosen precision level of the estimates, as it has been shown by Rivest (2002). Nevertheless, this regression-based approach is highly sensitive to the presence of contaminated data. Indeed, the influence of outlying observations in both $y$ and $x$ has an explosive impact on the variances with the effect of strong departures from the optimum sample allocation. Therefore, we expect increasing sample sizes in the strata, wrong allocation of sampling units in the strata and some errors in the strata bounds determination. Since the key tool for stratified sampling is the measure of scale of $y$ conditional to the knowledge of some auxiliary $x$, a robust approach based on $S-$estimator of regression is proposed in this paper. The aim is to allow for robust sample size and strata bounds determination, together with the optimal sample allocation. To show the advantages of the proposed method, an empirical illustration is provided for Belgian business surveys in the sector of Construction. It is considered a skewed population framework, which is typical for businesses, with a stratified design with one \emph{take-all} stratum and $L-1$ strata. Simulation results are also provided.
2011
Società Italiana di Statistica 2011
04 Pubblicazione in atti di convegno::04c Atto di convegno in rivista
Response burden reduction through the use of administrative data and robust sampling / Bramati, Maria Caterina. - STAMPA. - 2:(2011), pp. 88-92. (Intervento presentato al convegno Società Italiana di Statistica 2011 tenutosi a Bologna nel Giugno 2011).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/407263
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact