Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN

Lenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially comprom...

Full description

Saved in:
Bibliographic Details
Main Authors: Padmaja Ramachandra, Santhi Vaithiyanathan
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10930895/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850276157999546368
author Padmaja Ramachandra
Santhi Vaithiyanathan
author_facet Padmaja Ramachandra
Santhi Vaithiyanathan
author_sort Padmaja Ramachandra
collection DOAJ
description Lenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially compromising individuals’ privacy. We investigate the potential of replacing the original data with statistically similar synthetic data to be used as training data to predict loan defaulters. Generative adversarial networks stand out with the best privacy utility trade-off among the many approaches that can generate synthetic data. However, limited data and membership inference attacks still significantly affect the privacy of the synthetic data generated. This paper uses federated learning to create synthetic data to improve loan defaulter prediction. By allowing multiple institutions to collaboratively train a shared model without directly exchanging sensitive loan data, this privacy-preserving approach allows for the utilization of more extensive and diverse datasets, potentially leading to significant improvements in prediction accuracy compared to traditional, siloed approaches. This method also builds a layer of privacy to safeguard against membership inference attacks using a differentially private optimizer layer in the WGAN architecture. The proposed framework utilizes a Min-Max Module and Autoencoders for global representation of distributed loan data, followed by a Federated WGAN, enhanced with a Gaussian noise-added Adagrad optimizer, to generate high-fidelity, privacy-preserved synthetic credit risk data. Furthermore, we use CTGAN to mitigate class imbalance in the synthetic dataset. Finally, we test the generated synthetic dataset using a model trained on the real dataset to check its utility. With this novel framework, we could attain a privacy-utility tradeoff in a federated environment. Our extensive experimentation, which includes a variety of communication rounds, client populations, and privacy budgets, demonstrates that the proposed framework achieves an impressive 95% accuracy. The proposed model marks a significant improvement over both a non-federated setup, which achieves 82% accuracy, and FLIGAN with 73% accuracy, which possesses the same functionalities with different techniques.
format Article
id doaj-art-e61eb74e7caa4082ae0ad0182cd7d419
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-e61eb74e7caa4082ae0ad0182cd7d4192025-08-20T01:50:23ZengIEEEIEEE Access2169-35362025-01-0113520695208410.1109/ACCESS.2025.355248710930895Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GANPadmaja Ramachandra0https://orcid.org/0009-0006-1196-1327Santhi Vaithiyanathan1https://orcid.org/0000-0002-4274-7474School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, IndiaLenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially compromising individuals’ privacy. We investigate the potential of replacing the original data with statistically similar synthetic data to be used as training data to predict loan defaulters. Generative adversarial networks stand out with the best privacy utility trade-off among the many approaches that can generate synthetic data. However, limited data and membership inference attacks still significantly affect the privacy of the synthetic data generated. This paper uses federated learning to create synthetic data to improve loan defaulter prediction. By allowing multiple institutions to collaboratively train a shared model without directly exchanging sensitive loan data, this privacy-preserving approach allows for the utilization of more extensive and diverse datasets, potentially leading to significant improvements in prediction accuracy compared to traditional, siloed approaches. This method also builds a layer of privacy to safeguard against membership inference attacks using a differentially private optimizer layer in the WGAN architecture. The proposed framework utilizes a Min-Max Module and Autoencoders for global representation of distributed loan data, followed by a Federated WGAN, enhanced with a Gaussian noise-added Adagrad optimizer, to generate high-fidelity, privacy-preserved synthetic credit risk data. Furthermore, we use CTGAN to mitigate class imbalance in the synthetic dataset. Finally, we test the generated synthetic dataset using a model trained on the real dataset to check its utility. With this novel framework, we could attain a privacy-utility tradeoff in a federated environment. Our extensive experimentation, which includes a variety of communication rounds, client populations, and privacy budgets, demonstrates that the proposed framework achieves an impressive 95% accuracy. The proposed model marks a significant improvement over both a non-federated setup, which achieves 82% accuracy, and FLIGAN with 73% accuracy, which possesses the same functionalities with different techniques.https://ieeexplore.ieee.org/document/10930895/Federated learningWGANCTGANdifferentially private dataoversamplingtrivial auto-encoder
spellingShingle Padmaja Ramachandra
Santhi Vaithiyanathan
Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
IEEE Access
Federated learning
WGAN
CTGAN
differentially private data
oversampling
trivial auto-encoder
title Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_full Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_fullStr Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_full_unstemmed Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_short Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_sort fed dpsdg wgan differentially private synthetic data generation for loan default prediction via federated wasserstein gan
topic Federated learning
WGAN
CTGAN
differentially private data
oversampling
trivial auto-encoder
url https://ieeexplore.ieee.org/document/10930895/
work_keys_str_mv AT padmajaramachandra feddpsdgwgandifferentiallyprivatesyntheticdatagenerationforloandefaultpredictionviafederatedwassersteingan
AT santhivaithiyanathan feddpsdgwgandifferentiallyprivatesyntheticdatagenerationforloandefaultpredictionviafederatedwassersteingan