Stratified Sampling for time dependent survey variables

Bramati, Maria Caterina

The temporal dimension is often neglected in survey designs since, in general, samples are drawn independently across statistical units and across time. However, there are surveys for which target variables are characterized by a strong time dependence. The typical examples are monthly surveys for short terms statistics (STS), generally prices, which use a sample of firms interviewed on a regular basis throughout the year. This scheme corresponds to a panel of statistical units which is fixed during a time interval, generally a year, and might vary in the next time-frame for a few firms which are dropped and replaced, whilst the majority of the sample (usually consisting in large firms) does not change. In the common practice of NSIs, the sampling scheme is not reviewed so often (at least not on a monthly basis) also for a matter of coherence and comparability of survey variables. Therefore, it is often possible to construct series of observations of a target variable $Y$ for each of the $n$ sampling units $i=1,\ldots ,n$ in the time periods $t=1,\ldots, T$. Now, in the classical approach to the optimal sampling in the Neyman sense, the sample is allocated proportionally to the cross-sectional dispersion of the target variable, which means that the higher the standard deviation of the variable is, the larger the sample size. When the stratification is based on variables such as the economic size of firms, this implies that large firms are constantly surveyed all months through the years, with consequent heavy response burdens. When the information on the target variable $Y$ is collected during a time interval, then it is possible to study the temporal pattern of the variable itself in order to improve the sampling design. Ideally, not only the dispersion across population units is taken into account, but also the variability through time and the ability of forecasting future values of the target variable conditional to the values observed in the past. Intuitively, the higher the variance in time of the target variable is, the greater the sample size needed for a fixed precision level of estimates. On the other hand, the higher the correlation (positive or negative) of present values with past values of $Y$ is, the higher the ability to forecast the future and the smaller the sample size needed at a given precision of estimates. In particular, when dealing with stratified designs based on prices, price variation \emph{across units} alone might not be able to capture the proper variance structure of the target variable, which is known to be autocorrelated, and therefore better described \emph{across the time dimension}. This approach to stratification and to sample allocation clearly has the advantage to reduce sample sizes since firms with almost zero price variations in time are less likely to be regularly surveyed. Nowadays, NSIs make more extensive use of administrative data, like the VAT registry, which contains also records concerning the past. This historical dimension allows to account for the variability through time of some auxiliary information as proxies for variances of the target variables.

Stratified Sampling for time dependent survey variables / Bramati, Maria Caterina. - STAMPA. - (2013), pp. 123-124. (Intervento presentato al convegno 3rd Italian Conference on Survey Methodology tenutosi a Milan nel 26-28 June).