Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN

Lenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially comprom...

Full description

Saved in:

Bibliographic Details
Main Authors:	Padmaja Ramachandra, Santhi Vaithiyanathan
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Federated learning WGAN CTGAN differentially private data oversampling trivial auto-encoder
Online Access:	https://ieeexplore.ieee.org/document/10930895/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850276157999546368
author	Padmaja Ramachandra Santhi Vaithiyanathan
author_facet	Padmaja Ramachandra Santhi Vaithiyanathan
author_sort	Padmaja Ramachandra
collection	DOAJ
description	Lenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially compromising individuals’ privacy. We investigate the potential of replacing the original data with statistically similar synthetic data to be used as training data to predict loan defaulters. Generative adversarial networks stand out with the best privacy utility trade-off among the many approaches that can generate synthetic data. However, limited data and membership inference attacks still significantly affect the privacy of the synthetic data generated. This paper uses federated learning to create synthetic data to improve loan defaulter prediction. By allowing multiple institutions to collaboratively train a shared model without directly exchanging sensitive loan data, this privacy-preserving approach allows for the utilization of more extensive and diverse datasets, potentially leading to significant improvements in prediction accuracy compared to traditional, siloed approaches. This method also builds a layer of privacy to safeguard against membership inference attacks using a differentially private optimizer layer in the WGAN architecture. The proposed framework utilizes a Min-Max Module and Autoencoders for global representation of distributed loan data, followed by a Federated WGAN, enhanced with a Gaussian noise-added Adagrad optimizer, to generate high-fidelity, privacy-preserved synthetic credit risk data. Furthermore, we use CTGAN to mitigate class imbalance in the synthetic dataset. Finally, we test the generated synthetic dataset using a model trained on the real dataset to check its utility. With this novel framework, we could attain a privacy-utility tradeoff in a federated environment. Our extensive experimentation, which includes a variety of communication rounds, client populations, and privacy budgets, demonstrates that the proposed framework achieves an impressive 95% accuracy. The proposed model marks a significant improvement over both a non-federated setup, which achieves 82% accuracy, and FLIGAN with 73% accuracy, which possesses the same functionalities with different techniques.
format	Article
id	doaj-art-e61eb74e7caa4082ae0ad0182cd7d419
institution	OA Journals
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-e61eb74e7caa4082ae0ad0182cd7d4192025-08-20T01:50:23ZengIEEEIEEE Access2169-35362025-01-0113520695208410.1109/ACCESS.2025.355248710930895Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GANPadmaja Ramachandra0https://orcid.org/0009-0006-1196-1327Santhi Vaithiyanathan1https://orcid.org/0000-0002-4274-7474School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, IndiaLenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially compromising individuals’ privacy. We investigate the potential of replacing the original data with statistically similar synthetic data to be used as training data to predict loan defaulters. Generative adversarial networks stand out with the best privacy utility trade-off among the many approaches that can generate synthetic data. However, limited data and membership inference attacks still significantly affect the privacy of the synthetic data generated. This paper uses federated learning to create synthetic data to improve loan defaulter prediction. By allowing multiple institutions to collaboratively train a shared model without directly exchanging sensitive loan data, this privacy-preserving approach allows for the utilization of more extensive and diverse datasets, potentially leading to significant improvements in prediction accuracy compared to traditional, siloed approaches. This method also builds a layer of privacy to safeguard against membership inference attacks using a differentially private optimizer layer in the WGAN architecture. The proposed framework utilizes a Min-Max Module and Autoencoders for global representation of distributed loan data, followed by a Federated WGAN, enhanced with a Gaussian noise-added Adagrad optimizer, to generate high-fidelity, privacy-preserved synthetic credit risk data. Furthermore, we use CTGAN to mitigate class imbalance in the synthetic dataset. Finally, we test the generated synthetic dataset using a model trained on the real dataset to check its utility. With this novel framework, we could attain a privacy-utility tradeoff in a federated environment. Our extensive experimentation, which includes a variety of communication rounds, client populations, and privacy budgets, demonstrates that the proposed framework achieves an impressive 95% accuracy. The proposed model marks a significant improvement over both a non-federated setup, which achieves 82% accuracy, and FLIGAN with 73% accuracy, which possesses the same functionalities with different techniques.https://ieeexplore.ieee.org/document/10930895/Federated learningWGANCTGANdifferentially private dataoversamplingtrivial auto-encoder
spellingShingle	Padmaja Ramachandra Santhi Vaithiyanathan Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN IEEE Access Federated learning WGAN CTGAN differentially private data oversampling trivial auto-encoder
title	Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_full	Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_fullStr	Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_full_unstemmed	Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_short	Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
title_sort	fed dpsdg wgan differentially private synthetic data generation for loan default prediction via federated wasserstein gan
topic	Federated learning WGAN CTGAN differentially private data oversampling trivial auto-encoder
url	https://ieeexplore.ieee.org/document/10930895/
work_keys_str_mv	AT padmajaramachandra feddpsdgwgandifferentiallyprivatesyntheticdatagenerationforloandefaultpredictionviafederatedwassersteingan AT santhivaithiyanathan feddpsdgwgandifferentiallyprivatesyntheticdatagenerationforloandefaultpredictionviafederatedwassersteingan

Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN

Similar Items