Understanding Privacy Leakage From Independently Computed Statistics

Master Thesis


Membership inference attacks have received increasing attention in recent years due to rising privacy concerns in machine learning, as well as their close relationship with the theoretical guarantee of differential privacy. In these attacks, an adversary attempts to decide if a given individual was a member of the input or training dataset for some published statistics or a published machine learning model. The success or failure of these attacks provides some intuitive sense of privacy leakage, since the identification of an individual as a member of an input dataset might associate them with some shared attributes of the group, such as their health status. However, the analysis of existing attacks from a theoretical and statistical perspective is relatively limited. We define and analyze a hypothesis test with an inner product test statistic, which formulates a generic membership inference attack for any published linear statistics. We evaluate the performance of this attack and define an upper bound on the size of the underlying dataset for which the attack works well. We consider the data distributions that lead to particularly high performance and define linear transformations that improve the attack. The addition of differential privacy offers some protection from membership inference, so we formulate a bound for the attack in terms of differential privacy parameters and show that a lack of differential privacy implies a successful attack in some cases. To evaluate our attack in practice, we apply our analysis to the release of one-way marginals, covariance values, and two-way marginals and verify the behavior through implemented experiments.