Recently, many statistical institutes have been moving from traditional estimation approaches based on sample survey data to new approaches that try to exploit the increased availability of administrative data, due to the need of reducing the response burden and providing users with more reliable statistical information. In this context, problems concerning the use of multiple sources for estimation purposes have been receiving an increasing attention in Official Statistics. A commonly adopted strategy is to rely on a “hierarchy” of the sources, based on preliminary analyses of the data quality of each source. In this work, we propose an alternative approach based on the concept of latent variables, where one takes advantage of the simultaneous availability of information from different sources. The true values of the target variable are viewed as realizations from a latent (unobserved) variable and the distinct (possibly coinciding) observed values from different sources are considered as imperfect measurements of this latent variable. According to this approach, all the available information is used and “weighted” according to its reliability, and a prediction of “true” values of some numeric variable of interest is obtained conditional on all the available information.
Estimation from contaminated multi-source data based on latent class models / Guarnera, U.; Varriale, R.. - In: STATISTICAL JOURNAL OF THE IAOS. - ISSN 1874-7655. - 32:4(2016), pp. 537-544. [10.3233/SJI-150951]
Estimation from contaminated multi-source data based on latent class models
Guarnera U.;Varriale R.
2016
Abstract
Recently, many statistical institutes have been moving from traditional estimation approaches based on sample survey data to new approaches that try to exploit the increased availability of administrative data, due to the need of reducing the response burden and providing users with more reliable statistical information. In this context, problems concerning the use of multiple sources for estimation purposes have been receiving an increasing attention in Official Statistics. A commonly adopted strategy is to rely on a “hierarchy” of the sources, based on preliminary analyses of the data quality of each source. In this work, we propose an alternative approach based on the concept of latent variables, where one takes advantage of the simultaneous availability of information from different sources. The true values of the target variable are viewed as realizations from a latent (unobserved) variable and the distinct (possibly coinciding) observed values from different sources are considered as imperfect measurements of this latent variable. According to this approach, all the available information is used and “weighted” according to its reliability, and a prediction of “true” values of some numeric variable of interest is obtained conditional on all the available information.File | Dimensione | Formato | |
---|---|---|---|
Guarnera_Estimation_2016.pdf
accesso aperto
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
771.53 kB
Formato
Adobe PDF
|
771.53 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.