The number of categorical observations that are unique in a sample and also unique, or rare, in the population is usually taken as a measure of the overall risk of disclosure in the sample data. Another important and commonly used risk measure is the expected number of correct guesses if each sample unique is matched with an individual chosen at random from the corresponding population cell. Many authors have attempted to estimate these quantities in cross classifications of the key variables, i.e. multi-way contingency tables of those categorical (key) variables that permit the identification of individuals in the sample. Methods based on parametric assumptions dominate this literature. On one hand, assuming exchangeability of cells, elaborations of the Poisson model (Poisson-gamma, Poisson-lognormal, etc.) have been extensively applied; on the other hand, relaxing the exchangeability assumption, logistic or log-linear models have been introduced to capture the underlying probability structure of the contingency table. Our Bayesian semi-parametric approach considers a Poisson model with rates explained by a mixed effects log-linear model with Dirichlet process random effects. Suitable specifications of the base measure of the Dirichlet process allow for useful and interesting extensions of many parametric models for disclosure risk estimation. An application to real data is also considered.

Semi parametric log-linear models for Bayesian re-identification risk assessment / Cinzia, Carota; Maurizio, Filippone; Roberto, Leombruni; Polettini, Silvia. - ELETTRONICO. - (2012), pp. 1-6. (Intervento presentato al convegno Privacy in Statistical Databases 2012 (PSD 2012) tenutosi a Palermo nel 26-28 September 2012).

Semi parametric log-linear models for Bayesian re-identification risk assessment

POLETTINI, SILVIA
2012

Abstract

The number of categorical observations that are unique in a sample and also unique, or rare, in the population is usually taken as a measure of the overall risk of disclosure in the sample data. Another important and commonly used risk measure is the expected number of correct guesses if each sample unique is matched with an individual chosen at random from the corresponding population cell. Many authors have attempted to estimate these quantities in cross classifications of the key variables, i.e. multi-way contingency tables of those categorical (key) variables that permit the identification of individuals in the sample. Methods based on parametric assumptions dominate this literature. On one hand, assuming exchangeability of cells, elaborations of the Poisson model (Poisson-gamma, Poisson-lognormal, etc.) have been extensively applied; on the other hand, relaxing the exchangeability assumption, logistic or log-linear models have been introduced to capture the underlying probability structure of the contingency table. Our Bayesian semi-parametric approach considers a Poisson model with rates explained by a mixed effects log-linear model with Dirichlet process random effects. Suitable specifications of the base measure of the Dirichlet process allow for useful and interesting extensions of many parametric models for disclosure risk estimation. An application to real data is also considered.
2012
9788469562925
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/491787
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact