EZPrivSecFL: Practical Private and Secure Federated Learning Framework
Bachelor Thesis, Master Thesis
Motivation
Federated learning (FL) is an emerging collaborative machine learning paradigm that addresses critical data privacy issues by enabling clients to train a global model using an aggregation server without revealing their training data [1]. FL is more efficient than traditional training as it uses the computation power and data of potentially millions of clients for training in parallel. However, such FL training raises several concerns about privacy and security.
Regarding privacy, FL is vulnerable to inference attacks by malicious aggregators that can infer clients’ data from their model updates. To tackle this concern, secure aggregation restricts the central aggregator to only learn the summation or average of the updates of clients [2,3,4]. Regarding security, standard FL techniques are vulnerable to Byzantine failures when a bounded number of clients are malicious and send fake local models to the server. The key idea of existing Byzantine-robust FL methods is that the server analyzes the clients’ local model updates and removes suspicious ones, before aggregating them to update the global model [5,6]. Mitigating both privacy and security concerns in FL simultaneously is highly challenging, because private FL prohibits access to individual model updates to avoid leakage, while secure FL requires access for comprehensive mathematical analysis [7,8,9]. On the other hand, existing FL libraries like LEAF [10], TensorFlow Federated [11], and FedML [12] do not support privacy and security concerns in FL yet.
This thesis should develop a framework called EZPrivSecFL, which simultaneously achieves model privacy and security with the help of cryptography. More concretely, this thesis should provide a Private and Secure Byzantine-robust FL in a dishonest-majority setting (clients). EZPrivSecFL will use the FLTrust aggregation approach [6] to resist Byzantine attacks. It will employ two non-colluding semi-honest servers and use secure two-party computation, specifically ABY2.0 primitives [13], to construct efficient private building blocks for FLTrust secure aggregation.
Goal
The student should implement the FLTrust approach in the MPC framework MOTION2NX [14]. To do so, the combination of Arithmetic Sharing and Garbled Circuits from ABY2.0 [13] should be used. The student should use an approximation for the ReLU function as proposed in [15]. In the end, EZPrivSecFL should convert FL libraries' code into private and secure FL in a fully automated manner. More concretely, the student should implement an end-to-end compiler from FL libraries like TensorFlow Federated [11] to a semi-honest 2PC protocol in the MOTION2NX framework. The EZPrivSecFL framework should have the following main contributions:
- Easy to use: EZPrivSecFL should natively support FL libraries like TensorFlow Federated [11]. It should be easy to implement and to understand for users without knowledge about cryptography. EZPrivSecFL should provide an efficient and reproducible means for developing and evaluating private and secure FL algorithms.
- Private FL: EZPrivSecFL should keep the input data of each user secure against any other user and semi-honest servers using secure aggregation.
- Secure FL: EZPrivSecFL should offer Byzantine robustness and allow the incorporation of FLTrust robust aggregation [6].
- Evaluation: EZPrivSecFL should be evaluated on existing poisoning attacks and different datasets proposed in [6, section VI]. The performance also should be compared to FLTrust [6].
In summary, by converting code in the FL library to private and secure FL, this thesis will significantly lower the entry barrier for FL engineers to use cryptographic MPC protocols in real-world FL applications.
Requirements
- High motivation for challenging engineering tasks
- At least basic knowledge of secure two party computation and ML algorithms
- Good programming skills in Python, C/C++
- High motivation + ability to work independently
- Knowledge of the English language, Git, LaTeX, etc. goes without saying
References
- [1] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. . In AISTATS, 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data
- [2] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. . In CCS, 2017. Practical Secure Aggregation for Privacy-preserving Machine Learning
- [3] James Henry Bell, Kallista A. Bonawitz, Adria Gascon, Tancrede Lepoint, and Mariana Raykova. . In CCS, 2020. Secure Single-Server Aggregation with (Poly)logarithmic Overhead
- [4] Hossein Fereidooni, Samuel Marchal, Markus Miettinen, Azalia Mirhoseini, Helen Möllering, Thien Duc Nguyen, Phillip Rieger, Ahmad-Reza Sadeghi, Thomas Schneider, Hossein Yalame, and Shaza Zeitouni. . In DLS, 2021. SAFELearn: Secure Aggregation for private FEderated Learning
- [5] Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. . In AAAI, 2021. Provably Secure Federated Learning against Malicious Clients
- [6] Xiaoyu Cao, Minghong Fang, Jia Liu, and Neil Zhenqiang Gong. FLTrust: . In NDSS, 2021. Byzantine-robust Federated Learning via Trust Bootstrapping
- [7] Ye Dong, Xiaojun Chen, Kaiyun Li, Dakui Wang, and Shuai Zeng. . In ESORICS, 2021. FLOD: Oblivious Defender for Private Byzantine-Robust Federated Learning with Dishonest-Majority
- [8] Xiaoyuan Liu, Hongwei Li, Guowen Xu, Zongqi Chen, Xiaoming Huang, and Rongxing Lu. . In TIFS, 2021. Privacy-Enhanced Federated Learning Against Poisoning Adversaries
- [9] Thien Duc Nguyen, Phillip Rieger, Hossein Yalame, Helen Möllering, Hossein Fereidooni, Samuel Marchal, Markus Miettinen, Azalia Mirhoseini, Ahmad-Reza Sadeghi, Thomas Schneider, and Shaza Zeitouni. . In preprint arXiv:2101.02281, 2021. FLGUARD: Secure and Private Federated Learning
- [10] Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konecný, H. Brendan McMahan, Virginia Smith, Ameet Talwalkar. . In preprint arXiv:1812.01097, 2018. LEAF: A Benchmark for Federated Settings
- [11] Alex Ingerman, and Krzys Ostrowski. , 2019. TensorFlow Federated
- [12] FedML developers. . In preprint arXiv:2007.13518, 2020. FedML: A Research Library and Benchmark for Federated Machine Learning
- [13] Arpita Patra, Thomas Schneider, Ajith Suresh, Hossein Yalame. . In USENIX Security, 2021. ABY2.0: Improved Mixed-Protocol Secure Two-Party Computation
- [14] Lennart Braun, Rosario Cammarota, and Thomas Schneider. (opens in new tab). In Privacy in Machine Learning Workshop (PriML@NeurIPS'21). Code: POSTER: A generic hybrid 2PC framework with application to private inference of unmodified neural networks (Extended Abstract) https://encrypto.de/code/MOTION2NX
- [15] Ramy E. Ali, Jinhyun So, and A. Salman Avestimehr. . In preprint arXiv:2011.05530, 2020. On polynomial approximations for privacy-preserving and verifiable relu networks
Supervisors
