In releasing data arising from sample surveys, statistical Agencies must face the obligation to protect the confidentiality of respondents. Prior to any data release, disclosure risk assessment is necessary. Disclosure may occur because ill-intentioned users might exploit their own information to link records in the re- leased data to target individuals by matching on common characteristics (keys) that permit identification. Following the literature, we focus on categorical key variables, which is typical for socio-demographic surveys. Intuitively, individu- als who are unique or rare in the population with respect to the key variables are at high risk of disclosure. Indeed the number of observations that are unique in a sample and also unique, or rare, in the population is commonly used to measure the overall risk of disclosure in the sample data. Many authors have attempted to estimate risk by employing parametric models on cross classifications of the keys, i.e. multi-way contingen
Bayesian semiparametric disclosure risk estimation via mixed effects log-linear models / Polettini, Silvia; Cinzia, Carota; Maurizio, Filippone; Roberto, Leombruni. - (2014). (Intervento presentato al convegno Frontiers of Hierarchical Modeling in Observational Studies, Complex Surveys and Big Data: A Conference Honoring Professor Malay Ghosh tenutosi a Joint Program in Survey Methodology (JPSM) University of Maryland College Park, Maryland, USA nel May 28–31, 2014).
Bayesian semiparametric disclosure risk estimation via mixed effects log-linear models
POLETTINI, SILVIA;
2014
Abstract
In releasing data arising from sample surveys, statistical Agencies must face the obligation to protect the confidentiality of respondents. Prior to any data release, disclosure risk assessment is necessary. Disclosure may occur because ill-intentioned users might exploit their own information to link records in the re- leased data to target individuals by matching on common characteristics (keys) that permit identification. Following the literature, we focus on categorical key variables, which is typical for socio-demographic surveys. Intuitively, individu- als who are unique or rare in the population with respect to the key variables are at high risk of disclosure. Indeed the number of observations that are unique in a sample and also unique, or rare, in the population is commonly used to measure the overall risk of disclosure in the sample data. Many authors have attempted to estimate risk by employing parametric models on cross classifications of the keys, i.e. multi-way contingenI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.