Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN

Lenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially comprom...

Full description

Saved in:
Bibliographic Details
Main Authors: Padmaja Ramachandra, Santhi Vaithiyanathan
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10930895/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Lenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially compromising individuals’ privacy. We investigate the potential of replacing the original data with statistically similar synthetic data to be used as training data to predict loan defaulters. Generative adversarial networks stand out with the best privacy utility trade-off among the many approaches that can generate synthetic data. However, limited data and membership inference attacks still significantly affect the privacy of the synthetic data generated. This paper uses federated learning to create synthetic data to improve loan defaulter prediction. By allowing multiple institutions to collaboratively train a shared model without directly exchanging sensitive loan data, this privacy-preserving approach allows for the utilization of more extensive and diverse datasets, potentially leading to significant improvements in prediction accuracy compared to traditional, siloed approaches. This method also builds a layer of privacy to safeguard against membership inference attacks using a differentially private optimizer layer in the WGAN architecture. The proposed framework utilizes a Min-Max Module and Autoencoders for global representation of distributed loan data, followed by a Federated WGAN, enhanced with a Gaussian noise-added Adagrad optimizer, to generate high-fidelity, privacy-preserved synthetic credit risk data. Furthermore, we use CTGAN to mitigate class imbalance in the synthetic dataset. Finally, we test the generated synthetic dataset using a model trained on the real dataset to check its utility. With this novel framework, we could attain a privacy-utility tradeoff in a federated environment. Our extensive experimentation, which includes a variety of communication rounds, client populations, and privacy budgets, demonstrates that the proposed framework achieves an impressive 95% accuracy. The proposed model marks a significant improvement over both a non-federated setup, which achieves 82% accuracy, and FLIGAN with 73% accuracy, which possesses the same functionalities with different techniques.
ISSN:2169-3536