Identification and Formal Privacy Guarantees
70 Pages Posted: 17 Jul 2020 Last revised: 12 Oct 2022
Date Written: October 11, 2022
Abstract
The reliance of empirical economic research on highly sensitive individual datasets and the increasing availability of public individual-level data that comes. e.g., from social networks, public government records and directories creates privacy risks as adversaries may potentially de-identify anonymized records in sensitive research datasets. To deal with such risks, the computer science research proposed differential privacy (DP) -- a formal criterion for the evaluation of non-disclosure guarantees for released statistics and the related methodology to ensure such guarantees.
While previous work on DP focused on DP guarantees for specific data statistics, its impact on identification of parameters of interest determined from the population distribution has not been studied. This paper bridges this gap.
In this paper we find that there is a broad class of population parameters that are not identified. Moreover, those parameters are not even partially identified, i.e. one cannot construct a set that would contain their population values. Population parameters of interest can be only characterized as elements random sets which requires the application of the toolkit of the random set theory to analyze their population properties. Identification becomes possible if the target parameter can be deterministically mapped within the random set. In that case, a full exploration of the support of the distribution of the random set of the weak limits of differentially private estimators can allow the data curator to select a sequence of instances of differentially private estimators that is guaranteed to converge to the target parameter in probability. We provide a decision-theoretic approach to this selection.
Our results indicate that expansion of formal privacy guarantees to socio-economic datasets requires further work on integrating data analysis with results and concepts from the random set theory as well as techniques for partial identification and inference.
Keywords: Differential privacy, average treatment effect, regression discontinuity,; random sets, identification
JEL Classification: C35, C14, C25, C13
Suggested Citation: Suggested Citation