DP-FedCMRS: Privacy-Preserving Federated Learning Algorithm to Solve Heterogeneous Data

In federated learning, non-independently and non-identically distributed heterogeneous data on the clients can limit both the convergence speed and model utility of federated learning, and gradients can be used to infer original data, posing a threat to user privacy. To address these issues, this pa...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang Zhang, Shigong Long, Guangyuan Liu, Junming Zhang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10910083/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In federated learning, non-independently and non-identically distributed heterogeneous data on the clients can limit both the convergence speed and model utility of federated learning, and gradients can be used to infer original data, posing a threat to user privacy. To address these issues, this paper proposes a Differential Privacy based Federated Learning algorithm with clustered model random selection (DPFedCMRS), which first clusters clients with similar data distributions into the same cluster, and then each cluster randomly selects a model from other clusters in iteration to learn the characteristics of different data distributions to solve the problem of data heterogeneity. Differential privacy is applied to the algorithm to achieve sample-level differential privacy and protect the client’s data privacy. An adaptive clustering algorithm is also proposed, which combines gradient quantile sparsification to amplify data characteristics to Ensure high-accuracy clustering results under the condition of high privacy guarantee. Many experiments are carried out on three classical datasets, MNIST, FMNIST and CIFAR10, the accuracy of the model trained by DPFedCMRS is increased by 5.98%, 5.16% and 2.05% compared with the existing methods when the privacy budget is 2, indicating that DPFedCMRS improves the performance of federated learning in the case of highly heterogeneous data distribution and high privacy guarantee.
ISSN:2169-3536