We propose a method for identifying models with good predictive performance in the family of Bayesian log-linear mixed models with Dirichlet process random effects for count data. Their wide applicability makes the assessment of model performance crucial in many fields, including disclosure risk estimation, which is the focus of the present work. Rather than assessing models on the whole contingency table, we target the specific objective of the analysis and propose a two-stage model selection procedure aimed at limiting a form of bias arising in the process of model selection. Our proposal combines two different criteria: at the first stage, a path in the model search space is identified through a strongly penalized log-likelihood; at the second, a small number of semi-parametric models is evaluated through a context-dependent score-based information criterion. Tested on a variety of contingency tables, our method proves to be able to identify models with good predictive performance in a few steps, even in the presence of large tables with many sampling and structural zeros. We carefully discuss the proposed method in the context of the literature on model assessment and contextualize the illustrative application in the recent debate on statistical disclosure limitation. Finally, we provide examples of further applications in different research areas.

Assessing Bayesian Semi‐Parametric Log‐Linear Models: An Application to Disclosure Risk Estimation / Carota, Cinzia; Filippone, Maurizio; Polettini, Silvia. - In: INTERNATIONAL STATISTICAL REVIEW. - ISSN 0306-7734. - (2022), pp. 165-183. [10.1111/insr.12471]

Assessing Bayesian Semi‐Parametric Log‐Linear Models: An Application to Disclosure Risk Estimation

Polettini, Silvia
2022

Abstract

We propose a method for identifying models with good predictive performance in the family of Bayesian log-linear mixed models with Dirichlet process random effects for count data. Their wide applicability makes the assessment of model performance crucial in many fields, including disclosure risk estimation, which is the focus of the present work. Rather than assessing models on the whole contingency table, we target the specific objective of the analysis and propose a two-stage model selection procedure aimed at limiting a form of bias arising in the process of model selection. Our proposal combines two different criteria: at the first stage, a path in the model search space is identified through a strongly penalized log-likelihood; at the second, a small number of semi-parametric models is evaluated through a context-dependent score-based information criterion. Tested on a variety of contingency tables, our method proves to be able to identify models with good predictive performance in a few steps, even in the presence of large tables with many sampling and structural zeros. We carefully discuss the proposed method in the context of the literature on model assessment and contextualize the illustrative application in the recent debate on statistical disclosure limitation. Finally, we provide examples of further applications in different research areas.
2022
Bayesian log-linear models, mixed effects, Dirichlet process, disclosure limitation
01 Pubblicazione su rivista::01a Articolo in rivista
Assessing Bayesian Semi‐Parametric Log‐Linear Models: An Application to Disclosure Risk Estimation / Carota, Cinzia; Filippone, Maurizio; Polettini, Silvia. - In: INTERNATIONAL STATISTICAL REVIEW. - ISSN 0306-7734. - (2022), pp. 165-183. [10.1111/insr.12471]
File allegati a questo prodotto
File Dimensione Formato  
insr.12471.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 204.06 kB
Formato Adobe PDF
204.06 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1569690
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact