The increasing availability of a large amount of multi-source information in national statistical institutes makes it necessary to investigate new methodological approaches, based on combining primary and secondary data, for the production of estimates. Primary data are collected by NSIs for statistical purposes, usually using a statistical sample survey. Secondary data, such as administrative registers and big data, are not collected by NSIs, and are not collected for statistical purposes. Still, they may be used by NSIs for producing statistics. In the context of qualitative/categorical data, there are different methodological approaches to produce estimates by exploiting all available information. Latent variable models may help take explicitly into account deficiencies in the measurement process of both survey and administrative sources. Machine learning techniques are frequently used to classify large amounts of data. The use of Hidden Markov Model and Machine Learning methods is described in the labour statistics context to predict the individual employment status. The relevant data may be drawn from the labour force survey conducted by Istat and from several administrative sources that Istat regularly acquires from external bodies.
Latent variable models and machine learning for prediction of employment status in Italy / Varriale, Roberta; Alfò, Marco. - (2022), pp. 75-75. (Intervento presentato al convegno 16th International Conference on Computational and Financial Econometrics and 15th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computational and Methodological Statistics tenutosi a Londra).
Latent variable models and machine learning for prediction of employment status in Italy
Roberta Varriale;Marco Alfò
2022
Abstract
The increasing availability of a large amount of multi-source information in national statistical institutes makes it necessary to investigate new methodological approaches, based on combining primary and secondary data, for the production of estimates. Primary data are collected by NSIs for statistical purposes, usually using a statistical sample survey. Secondary data, such as administrative registers and big data, are not collected by NSIs, and are not collected for statistical purposes. Still, they may be used by NSIs for producing statistics. In the context of qualitative/categorical data, there are different methodological approaches to produce estimates by exploiting all available information. Latent variable models may help take explicitly into account deficiencies in the measurement process of both survey and administrative sources. Machine learning techniques are frequently used to classify large amounts of data. The use of Hidden Markov Model and Machine Learning methods is described in the labour statistics context to predict the individual employment status. The relevant data may be drawn from the labour force survey conducted by Istat and from several administrative sources that Istat regularly acquires from external bodies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.