Catalogo dei prodotti della ricerca

In recent decades, National Statistical Institutes have started to produce official statistics by exploiting multiple sources of information (multi-source statistics) rather than a single source, usually a statistical survey. In this context, one of the research projects addressed by the Italian National Statistical Institute (Istat) concerned methods for producing estimates on employment in Italy using survey data and administrative sources. The former are drawn from the Labour Force survey conducted by Istat, the latter from several administrative sources that Istat regularly acquires from external bodies. We use machine learning methods to predict the individual employment status. This approach is based on the application of decision tree and random forest techniques, that are frequently used to classify large amounts of data. We show how to construct a “new” response variable denoting agreement of the data sources: this approach is shown to maximise the information we may derive by machine learning approach in some problematic cases. The methods have been applied using the R software.

Multi-source statistics on employment status in Italy, a machine learning approach / Varriale, R.; Alfo', M.. - In: METRON. - ISSN 0026-1424. - 81:(2023), pp. 37-63. [10.1007/s40300-023-00242-7]

Multi-source statistics on employment status in Italy, a machine learning approach

Varriale R.;Alfo' M.

2023

Abstract

In recent decades, National Statistical Institutes have started to produce official statistics by exploiting multiple sources of information (multi-source statistics) rather than a single source, usually a statistical survey. In this context, one of the research projects addressed by the Italian National Statistical Institute (Istat) concerned methods for producing estimates on employment in Italy using survey data and administrative sources. The former are drawn from the Labour Force survey conducted by Istat, the latter from several administrative sources that Istat regularly acquires from external bodies. We use machine learning methods to predict the individual employment status. This approach is based on the application of decision tree and random forest techniques, that are frequently used to classify large amounts of data. We show how to construct a “new” response variable denoting agreement of the data sources: this approach is shown to maximise the information we may derive by machine learning approach in some problematic cases. The methods have been applied using the R software.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Parole chiave
	
				classification error; employment status; machine learning; multi-source statistics
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Multi-source statistics on employment status in Italy, a machine learning approach / Varriale, R.; Alfo', M.. - In: METRON. - ISSN 0026-1424. - 81:(2023), pp. 37-63. [10.1007/s40300-023-00242-7]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Varriale_Multi-source-statistics_2023.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 958.44 kB Formato Adobe PDF Contatta l'autore	958.44 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1683701

Citazioni

ND

1

1

social impact