Recently, many statistical institutes have been moving from traditional estimation approaches based on sample survey data to new approaches that try to exploit the increased availability of administrative data, due to the need of reducing the response burden and providing users with more reliable statistical information. In this context, problems concerning the use of multiple sources for estimation purposes have been receiving an increasing attention in Official Statistics. A commonly adopted strategy is to rely on a “hierarchy” of the sources, based on preliminary analyses of the data quality of each source. In this work, we propose an alternative approach based on the concept of latent variables, where one takes advantage of the simultaneous availability of information from different sources. The true values of the target variable are viewed as realizations from a latent (unobserved) variable and the distinct (possibly coinciding) observed values from different sources are considered as imperfect measurements of this latent variable. According to this approach, all the available information is used and “weighted” according to its reliability, and a prediction of “true” values of some numeric variable of interest is obtained conditional on all the available information.

Estimation from contaminated multi-source data based on latent class models / Guarnera, U.; Varriale, R.. - In: STATISTICAL JOURNAL OF THE IAOS. - ISSN 1874-7655. - 32:4(2016), pp. 537-544. [10.3233/SJI-150951]

Estimation from contaminated multi-source data based on latent class models

Guarnera U.;Varriale R.
2016

Abstract

Recently, many statistical institutes have been moving from traditional estimation approaches based on sample survey data to new approaches that try to exploit the increased availability of administrative data, due to the need of reducing the response burden and providing users with more reliable statistical information. In this context, problems concerning the use of multiple sources for estimation purposes have been receiving an increasing attention in Official Statistics. A commonly adopted strategy is to rely on a “hierarchy” of the sources, based on preliminary analyses of the data quality of each source. In this work, we propose an alternative approach based on the concept of latent variables, where one takes advantage of the simultaneous availability of information from different sources. The true values of the target variable are viewed as realizations from a latent (unobserved) variable and the distinct (possibly coinciding) observed values from different sources are considered as imperfect measurements of this latent variable. According to this approach, all the available information is used and “weighted” according to its reliability, and a prediction of “true” values of some numeric variable of interest is obtained conditional on all the available information.
2016
multi-source data; data integration; contamination models; latent variables
01 Pubblicazione su rivista::01a Articolo in rivista
Estimation from contaminated multi-source data based on latent class models / Guarnera, U.; Varriale, R.. - In: STATISTICAL JOURNAL OF THE IAOS. - ISSN 1874-7655. - 32:4(2016), pp. 537-544. [10.3233/SJI-150951]
File allegati a questo prodotto
File Dimensione Formato  
Guarnera_Estimation_2016.pdf

accesso aperto

Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 771.53 kB
Formato Adobe PDF
771.53 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1682414
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact