Any decision about the release of microdata for public use is supported by the estimation of measures of disclosure risk, the most popular being the number τ1 of sample uniques that are also population uniques. In such a context, parametric and nonparametric partition-based models have been shown to have: i) the strength of leading to estimators of τ1 with de- sirable features, including ease of implementation, computational efficiency and scalability to massive data; ii) the weakness of producing underesti- mates of τ1 in realistic scenarios, with the underestimation getting worse as the tail behaviour of the empirical distribution of microdata gets heavier. To fix this underestimation phenomenon, we propose a Bayesian nonpara- metric partition-based model that can be tuned to the tail behaviour of the empirical distribution of microdata. Our model relies on the Pitman–Yor process prior, and it leads to a novel estimator of τ1 with all the desir- able features of partition-based estimators and that, in addition, allows to reduce underestimation by tuning a “discount” parameter. We show the effectiveness of our estimator through its application to synthetic data and real data.

Bayesian nonparametric disclosure risk assessment / Favaro, Stefano; Panero, Francesca; Rigon, Tommaso. - In: ELECTRONIC JOURNAL OF STATISTICS. - ISSN 1935-7524. - 15:2(2021), pp. 5626-5651. [10.1214/21-EJS1933]

Bayesian nonparametric disclosure risk assessment

Panero, Francesca;
2021

Abstract

Any decision about the release of microdata for public use is supported by the estimation of measures of disclosure risk, the most popular being the number τ1 of sample uniques that are also population uniques. In such a context, parametric and nonparametric partition-based models have been shown to have: i) the strength of leading to estimators of τ1 with de- sirable features, including ease of implementation, computational efficiency and scalability to massive data; ii) the weakness of producing underesti- mates of τ1 in realistic scenarios, with the underestimation getting worse as the tail behaviour of the empirical distribution of microdata gets heavier. To fix this underestimation phenomenon, we propose a Bayesian nonpara- metric partition-based model that can be tuned to the tail behaviour of the empirical distribution of microdata. Our model relies on the Pitman–Yor process prior, and it leads to a novel estimator of τ1 with all the desir- able features of partition-based estimators and that, in addition, allows to reduce underestimation by tuning a “discount” parameter. We show the effectiveness of our estimator through its application to synthetic data and real data.
2021
bayesian nonparametrics; data confidentiality; dirichlet process prior; disclosure risk assessment; empirical bayes; Pitman–Yor process prior
01 Pubblicazione su rivista::01a Articolo in rivista
Bayesian nonparametric disclosure risk assessment / Favaro, Stefano; Panero, Francesca; Rigon, Tommaso. - In: ELECTRONIC JOURNAL OF STATISTICS. - ISSN 1935-7524. - 15:2(2021), pp. 5626-5651. [10.1214/21-EJS1933]
File allegati a questo prodotto
File Dimensione Formato  
Panero-bayesian-nonparametric-disclosure_2021.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 362.75 kB
Formato Adobe PDF
362.75 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1711661
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact