Framework for Secure and Private Hierarchical Federated Learning

Bachelor Thesis

Motivation

Federated learning (FL) is an emerging collaborative machine learning paradigm that addresses critical data privacy issues by enabling clients to train a global model using an aggregation server without revealing their training data [2]. FL is more efficient than traditional training as it uses the computation power and data of potentially millions of clients for training in parallel. However, such FL training raises several concerns about privacy and security.

Regarding privacy, FL is vulnerable to inference attacks by malicious aggregators that can infer clients’ data from their model updates. To tackle this concern, secure aggregation restricts the central aggregator to only learn the summation or average of the updates of clients [3,4,5]. Regarding security, standard FL techniques are vulnerable to Byzantine failures when a bounded number of clients are malicious and send fake local models to the server. The key idea of existing Byzantine-robust FL methods is that the server analyzes the clients’ local model updates and removes suspicious ones, before aggregating them to update the global model [6]. Mitigating both privacy and security concerns in FL simultaneously is highly challenging, because private FL prohibits access to individual model updates to avoid leakage, while secure FL requires access for comprehensive mathematical analysis [7].

Compared with the typical FL architecture, applying FL on a cloud-edge-client hierarchical architecture, i.e, hierarchical federated learning (HFL), could train the model faster and achieve better communication/computation trade-offs [8]. However, HFL still suffers from both privacy and security issues. To address this problem, this thesis will propose a privacy-preserving scheme relying on cryptographic protocols. More concretely, this thesis should provide protocols for Private and Secure Byzantine-robust HFL. To protect against Byzantine attacks, this thesis will employ one of the useful aggregation techniques that has been discussed in the literature [6,8]. It will use secure two-party computation, notably primitives from the CRYPTEN framework [9], to create efficient and private building blocks for secure aggregation in HFL. It will use two non-colluding semi-honest servers.

Goal

The student will, at the first stage, study and analyze different robust aggregations. Then he will implement the robust aggregation in the MPC framework CRYPTEN [9]. To do so, the combination of Arithmetic Sharing and Secret Sharing should be used. In the end, the student should convert FL libraries' code into private and secure HFL in a fully automated manner. More concretely, the student should implement an end-to-end compiler from FL libraries like SionFL [8] to a semi-honest 2PC protocol in the CRYPTEN framework. The integration between CRYPTEN and FL library will be done in the last stage.

Requirements

High motivation for challenging engineering tasks
At least basic knowledge of secure two party computation and ML algorithms
Good programming skills in Python, Pytorch
High motivation + ability to work independently
Knowledge of the English language, Git, LaTeX, etc. goes without saying

References

[1] Ben-Itzhak Yaniv, Helen Möllering, Benny Pinkas, Thomas Schneider, Ajith Suresh, Oleksandr Tkachenko, Shay Vargaftik, Christian Weinert, Hossein Yalame, and Avishay Yanai. ScionFL: Secure Quantized Aggregation for Federated Learning. arXiv preprint arXiv:2210.07376, 2022.
[2] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS, 2017.
[3] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical Secure Aggregation for Privacy-preserving Machine Learning. In CCS, 2017.
[4] James Henry Bell, Kallista A. Bonawitz, Adria Gascon, Tancrede Lepoint, and Mariana Raykova. Secure Single-Server Aggregation with (Poly)logarithmic Overhead. In CCS, 2020.
[5] Hossein Fereidooni, Samuel Marchal, Markus Miettinen, Azalia Mirhoseini, Helen Möllering, Thien Duc Nguyen, Phillip Rieger, Ahmad-Reza Sadeghi, Thomas Schneider, Hossein Yalame, and Shaza Zeitouni. SAFELearn: Secure Aggregation for private FEderated Learning. In DLS, 2021.
[6] Xiaoyu Cao, Minghong Fang, Jia Liu, and Neil Zhenqiang Gong. FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping. In NDSS, 2021.
[7] Thien Duc Nguyen, Phillip Rieger, Huili Chen, Hossein Yalame, Helen Möllering, Hossein Fereidooni, Samuel Marchal, Markus Miettinen, Azalia Mirhoseini, Farinaz Koushanfar, Ahmad Reza Sadeghi, Thomas Schneider, and Shaza Zeitouni. FLAME: Taming Backdoors in Federated Learning. In USENIX Security, 2022.
[8] He Yang. H-FL: A Hierarchical Communication-Efficient and Privacy-Protected Architecture for Federated Learning. In IJCAI, 2021.
[9] Knott Brian, Shobha Venkataraman, Awni Hannun, Shubho Sengupta, Mark Ibrahim, and Laurens van der Maaten. Crypten: Secure multi-party computation meets machine learning. In NeurIPS, 2021.

Supervisors

Hossein Yalame, M.Sc.

Prof. Dr.-Ing. Thomas Schneider ( schneider@encrypto.cs.tu-…)