When microdata files for research are released, it is possible that external users may attempt to breach confidentiality. For this reason most National Statistical Institutes apply some form of disclosure risk assessment and data protection. Risk assessment first requires a measure of disclosure risk to be defined. In this paper we build on previous work byBenedetti and Franconi (1998) to define a Bayesian hierarchical model for risk estimation. We follow a superpopulation approach similar to Bethlehem et al. (1990) and Rinott (2003). For each combination of values of the key variables we derive the posterior distribution of the population frequency given the observed sample frequency. Knowledge of this posterior distribution enables us to obtain suitable summaries that can be used to estimate the risk of disclosure. One such summary is the mean of the reciprocal of the population frequency or Benedetti-Franconi risk, but we also investigate others such as the mode. We apply our approach to an artificial sample of the Italian 1991 Census data, drawn by means of a widely used sampling scheme. We report on results of this application and document the computational difficulties that we encountered. The risk estimates that we obtain are sensible, but suggest possible improvements and modifications to our methodology. We discuss these together with potential alternative strategies.

A Bayesian Hierarchical Model Approach to Risk Estimation in Statistical Disclosure Limitation / Polettini, Silvia; Julian, Stander. - STAMPA. - 3050(2004), pp. 247-261. - LECTURE NOTES IN COMPUTER SCIENCE. [10.1007/978-3-540-25955-8_19].

A Bayesian Hierarchical Model Approach to Risk Estimation in Statistical Disclosure Limitation

POLETTINI, SILVIA;
2004

Abstract

When microdata files for research are released, it is possible that external users may attempt to breach confidentiality. For this reason most National Statistical Institutes apply some form of disclosure risk assessment and data protection. Risk assessment first requires a measure of disclosure risk to be defined. In this paper we build on previous work byBenedetti and Franconi (1998) to define a Bayesian hierarchical model for risk estimation. We follow a superpopulation approach similar to Bethlehem et al. (1990) and Rinott (2003). For each combination of values of the key variables we derive the posterior distribution of the population frequency given the observed sample frequency. Knowledge of this posterior distribution enables us to obtain suitable summaries that can be used to estimate the risk of disclosure. One such summary is the mean of the reciprocal of the population frequency or Benedetti-Franconi risk, but we also investigate others such as the mode. We apply our approach to an artificial sample of the Italian 1991 Census data, drawn by means of a widely used sampling scheme. We report on results of this application and document the computational difficulties that we encountered. The risk estimates that we obtain are sensible, but suggest possible improvements and modifications to our methodology. We discuss these together with potential alternative strategies.
2004
PRIVACY IN STATISTICAL DATABASES
9783540221180
9783540259558
02 Pubblicazione su volume::02a Capitolo o Articolo
A Bayesian Hierarchical Model Approach to Risk Estimation in Statistical Disclosure Limitation / Polettini, Silvia; Julian, Stander. - STAMPA. - 3050(2004), pp. 247-261. - LECTURE NOTES IN COMPUTER SCIENCE. [10.1007/978-3-540-25955-8_19].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/467225
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 5
social impact