The goal of statistical matching, at amacrolevel, is the estimation of the joint distribution of variables separately observed in independent samples. The lack of joint informationon the variables of interest leads to uncertainty about the data generating model. In this paper we propose the use of graphical models to deal withthe statistical matching uncertainty for multivariate categorical variables. The use of Bayesian networks in thestatistical matching context allows both to introduce extra sample information on the dependence structure between the variables of interest andto use such an informationto factorize the joint probability distribution accordingto the graph decomposition ofa multivariate dependence in lower dimension components. This representation of the joint probability distribution, taking advantage of localrelationships, allows to simplifyboth parameters estimation and statistical matching quality evaluation in a multivariate context. A simulation experiment is performed in order to evaluate the performance ofthe proposed methodology with and without auxiliary information, as well as to compareit with the saturated multinomial model, in terms of uncertainty reduction. Finally, anapplication to a real case is provided. Results show a considerable improvement in the quality of statistical matching when the dependence structure is taken into account.
Multivariate Statistical Matching Using Graphical Modeling / Conti, Pier Luigi; Marella, Daniela; Vicard, Paola; Vitale, Vincenzina. - In: INTERNATIONAL JOURNAL OF APPROXIMATE REASONING. - ISSN 0888-613X. - 130:(2021), pp. 150-169. [10.1016/j.ijar.2020.12.006]
Multivariate Statistical Matching Using Graphical Modeling
Pier Luigi ContiMethodology
;Daniela Marella
Methodology
;Vincenzina VitaleMethodology
2021
Abstract
The goal of statistical matching, at amacrolevel, is the estimation of the joint distribution of variables separately observed in independent samples. The lack of joint informationon the variables of interest leads to uncertainty about the data generating model. In this paper we propose the use of graphical models to deal withthe statistical matching uncertainty for multivariate categorical variables. The use of Bayesian networks in thestatistical matching context allows both to introduce extra sample information on the dependence structure between the variables of interest andto use such an informationto factorize the joint probability distribution accordingto the graph decomposition ofa multivariate dependence in lower dimension components. This representation of the joint probability distribution, taking advantage of localrelationships, allows to simplifyboth parameters estimation and statistical matching quality evaluation in a multivariate context. A simulation experiment is performed in order to evaluate the performance ofthe proposed methodology with and without auxiliary information, as well as to compareit with the saturated multinomial model, in terms of uncertainty reduction. Finally, anapplication to a real case is provided. Results show a considerable improvement in the quality of statistical matching when the dependence structure is taken into account.File | Dimensione | Formato | |
---|---|---|---|
2021 JOURN_APPROX_REAS.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.63 MB
Formato
Adobe PDF
|
1.63 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.