Securely Realizing Output Privacy in MPC using Differential Privacy

Seminar Thesis

Abstract

The availability of large volumes of data enables a wide range of possibilities. It can, e.g., support companies to accelerate their business by improving their services and products. However, this comes with serious privacy and security implications. While information security has gained significant attention in recent years, privacy is often neglected. However, collected data may contain personal information that must be protected. This is not only due to legal regulations, but also because of personal interests of data owners.

First efforts have been made towards designing privacy-preserving data analytics, e.g., for machine learning (ML) algorithms. However, many of these approaches overlook practical needs crucial for real-world deployment. For example, purely relying on expensive cryptography often results in impractical performance. Another line of work investigated a concept called “Differential Privacy” (DP) [1], which is substantially more efficient. The idea of DP is to randomize a query’s result to diffuse effects that could leak information about a single data record. The higher the required level of privacy, the more randomization is needed that can heavily distort the accuracy of the analysis. Beyond cryptography and DP, Google introduced a collaborative approach for training a ML model called “Federated Learning” (FL) in 2017. In FL, data owners can keep their data locally. Thus, FL seems to solve the issue of data privacy at the first glance. But research has shown that it still has shortcomings with respect to privacy. Although no training data is directly shared, the updates still enable to infer information about the used training data.

In this thesis, we want to combine the best of three techniques: secure multi-party computation (MPC), DP, and FL to effectively and efficiently mitigate inference attacks in distributed ML.

Previous work [2] has already introduced a system called CaPC involving MPC and DP. It realizes private inference with multiple parties each holding its own model. The evaluation of the model at each client is realized with an arbitrary privacy-preserving inference protocol before so called garbled circuits are used to determine a concrete label. The resulting labels of all queried parties are then aggregated using an arithmetic sharing before a semi-trusted third party adds noise to achieve DP. Moreover, [8] combines DP with homomorphic encryption to thwart inference attacks which is computationally expensive.

In this thesis, the first step would be to investigate related work on FL with MPC, homomorphic encryption, and DP, starting with [2-5,8]. Next, the question of how to realize DP in MPC protocols should be investigated, starting with [6-7]. Based on these insights and potentially discovered short-comings of previous works, the requirements towards a new improved private, accurate, and efficient FL scheme shall be finalized. In the next step, a protocol should be designed, optimized, and implemented that meets these requirements. It should securely and efficiently aggregate updates by the means of DP and MPC. The implementation shall be benchmarked with respect to efficiency as well as effectiveness against inference attacks and compared to previous work.

References

Supervisors