Many Structural Business Statistics (SBSs) surveys, according to European Regulations, must move from considering the Legal Unit (LU) as unit of interest towards considering the Enterprise (ENT) as such. This transition is not trivial, as many NSIs still need to provide estimates at a LU level, for comparability through the time. Consequently, to modify and enhance the standard sample design based on LU to address this shift, it could be required to investigate an alternative stratification of the sample. To address this task, we propose to use a clustering algorithm, i.e., the K-prototype, to obtain groups of ENT and assess the variables' importance in the clustering result. The algorithm is applied to several input datasets, obtained by sub-setting the ASIA ENT 2021 register, which includes all enterprises carrying on economic activities. The input datasets include ENT working on different sections of the statistical classification of economic activities in the European Community (NACE) and ENT included in the target population of the Community Innovation Survey (CIS) carried out by ISTAT. The clustering is applied separately to each aforementioned dataset. From the clustering result, we assess the variables’ importance and identify the variables that mostly influence the obtained partition. The most influential variables are used to build the new stratification of the ENT, hence they contribute to a new definition of the strata. The proposed stratification is used to allocate a sample of the same dimension as the one extracted with the current stratification. From the sample, we estimate some of the survey’s target variables and their coefficient of variation (CV). The CVs are compared with the ones resulting from the current stratification. The comparison reveals that the efficiency of the estimates is preserved. In addition, the new stratification allows for reducing the number of strata and therefore also the processing time is limited.

A clustering approach for determining stratification variables in SBS surveys / Bombelli, Ilaria; Sacco, Giorgia; Guandalini, Alessio. - In: RIVISTA ITALIANA DI ECONOMIA, DEMOGRAFIA E STATISTICA. - ISSN 0035-6832. - LXXIX, n. 1, Gennaio–Marzo 2025:(2025), pp. 259-269.

A clustering approach for determining stratification variables in SBS surveys

Ilaria Bombelli;Giorgia Sacco;Alessio Guandalini
2025

Abstract

Many Structural Business Statistics (SBSs) surveys, according to European Regulations, must move from considering the Legal Unit (LU) as unit of interest towards considering the Enterprise (ENT) as such. This transition is not trivial, as many NSIs still need to provide estimates at a LU level, for comparability through the time. Consequently, to modify and enhance the standard sample design based on LU to address this shift, it could be required to investigate an alternative stratification of the sample. To address this task, we propose to use a clustering algorithm, i.e., the K-prototype, to obtain groups of ENT and assess the variables' importance in the clustering result. The algorithm is applied to several input datasets, obtained by sub-setting the ASIA ENT 2021 register, which includes all enterprises carrying on economic activities. The input datasets include ENT working on different sections of the statistical classification of economic activities in the European Community (NACE) and ENT included in the target population of the Community Innovation Survey (CIS) carried out by ISTAT. The clustering is applied separately to each aforementioned dataset. From the clustering result, we assess the variables’ importance and identify the variables that mostly influence the obtained partition. The most influential variables are used to build the new stratification of the ENT, hence they contribute to a new definition of the strata. The proposed stratification is used to allocate a sample of the same dimension as the one extracted with the current stratification. From the sample, we estimate some of the survey’s target variables and their coefficient of variation (CV). The CVs are compared with the ones resulting from the current stratification. The comparison reveals that the efficiency of the estimates is preserved. In addition, the new stratification allows for reducing the number of strata and therefore also the processing time is limited.
2025
Stratified sampling; Structural Business Statistics (SBS); Optimization
01 Pubblicazione su rivista::01a Articolo in rivista
A clustering approach for determining stratification variables in SBS surveys / Bombelli, Ilaria; Sacco, Giorgia; Guandalini, Alessio. - In: RIVISTA ITALIANA DI ECONOMIA, DEMOGRAFIA E STATISTICA. - ISSN 0035-6832. - LXXIX, n. 1, Gennaio–Marzo 2025:(2025), pp. 259-269.
File allegati a questo prodotto
File Dimensione Formato  
A clustering approach for determining stratification variables in SBS surveys.pdf

solo gestori archivio

Note: A clustering approach for determining stratification variables in SBS surveys
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 355.47 kB
Formato Adobe PDF
355.47 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1746885
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact