Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN
Lenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially comprom...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10930895/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850276157999546368 |
|---|---|
| author | Padmaja Ramachandra Santhi Vaithiyanathan |
| author_facet | Padmaja Ramachandra Santhi Vaithiyanathan |
| author_sort | Padmaja Ramachandra |
| collection | DOAJ |
| description | Lenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially compromising individuals’ privacy. We investigate the potential of replacing the original data with statistically similar synthetic data to be used as training data to predict loan defaulters. Generative adversarial networks stand out with the best privacy utility trade-off among the many approaches that can generate synthetic data. However, limited data and membership inference attacks still significantly affect the privacy of the synthetic data generated. This paper uses federated learning to create synthetic data to improve loan defaulter prediction. By allowing multiple institutions to collaboratively train a shared model without directly exchanging sensitive loan data, this privacy-preserving approach allows for the utilization of more extensive and diverse datasets, potentially leading to significant improvements in prediction accuracy compared to traditional, siloed approaches. This method also builds a layer of privacy to safeguard against membership inference attacks using a differentially private optimizer layer in the WGAN architecture. The proposed framework utilizes a Min-Max Module and Autoencoders for global representation of distributed loan data, followed by a Federated WGAN, enhanced with a Gaussian noise-added Adagrad optimizer, to generate high-fidelity, privacy-preserved synthetic credit risk data. Furthermore, we use CTGAN to mitigate class imbalance in the synthetic dataset. Finally, we test the generated synthetic dataset using a model trained on the real dataset to check its utility. With this novel framework, we could attain a privacy-utility tradeoff in a federated environment. Our extensive experimentation, which includes a variety of communication rounds, client populations, and privacy budgets, demonstrates that the proposed framework achieves an impressive 95% accuracy. The proposed model marks a significant improvement over both a non-federated setup, which achieves 82% accuracy, and FLIGAN with 73% accuracy, which possesses the same functionalities with different techniques. |
| format | Article |
| id | doaj-art-e61eb74e7caa4082ae0ad0182cd7d419 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-e61eb74e7caa4082ae0ad0182cd7d4192025-08-20T01:50:23ZengIEEEIEEE Access2169-35362025-01-0113520695208410.1109/ACCESS.2025.355248710930895Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GANPadmaja Ramachandra0https://orcid.org/0009-0006-1196-1327Santhi Vaithiyanathan1https://orcid.org/0000-0002-4274-7474School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, IndiaLenders typically conduct thorough credit checks to mitigate credit default risk before lending money. Proactively predicting loan defaulters has become increasingly important. However, creating robust deep-learning algorithms that classify loan defaulters requires abundant data, potentially compromising individuals’ privacy. We investigate the potential of replacing the original data with statistically similar synthetic data to be used as training data to predict loan defaulters. Generative adversarial networks stand out with the best privacy utility trade-off among the many approaches that can generate synthetic data. However, limited data and membership inference attacks still significantly affect the privacy of the synthetic data generated. This paper uses federated learning to create synthetic data to improve loan defaulter prediction. By allowing multiple institutions to collaboratively train a shared model without directly exchanging sensitive loan data, this privacy-preserving approach allows for the utilization of more extensive and diverse datasets, potentially leading to significant improvements in prediction accuracy compared to traditional, siloed approaches. This method also builds a layer of privacy to safeguard against membership inference attacks using a differentially private optimizer layer in the WGAN architecture. The proposed framework utilizes a Min-Max Module and Autoencoders for global representation of distributed loan data, followed by a Federated WGAN, enhanced with a Gaussian noise-added Adagrad optimizer, to generate high-fidelity, privacy-preserved synthetic credit risk data. Furthermore, we use CTGAN to mitigate class imbalance in the synthetic dataset. Finally, we test the generated synthetic dataset using a model trained on the real dataset to check its utility. With this novel framework, we could attain a privacy-utility tradeoff in a federated environment. Our extensive experimentation, which includes a variety of communication rounds, client populations, and privacy budgets, demonstrates that the proposed framework achieves an impressive 95% accuracy. The proposed model marks a significant improvement over both a non-federated setup, which achieves 82% accuracy, and FLIGAN with 73% accuracy, which possesses the same functionalities with different techniques.https://ieeexplore.ieee.org/document/10930895/Federated learningWGANCTGANdifferentially private dataoversamplingtrivial auto-encoder |
| spellingShingle | Padmaja Ramachandra Santhi Vaithiyanathan Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN IEEE Access Federated learning WGAN CTGAN differentially private data oversampling trivial auto-encoder |
| title | Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN |
| title_full | Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN |
| title_fullStr | Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN |
| title_full_unstemmed | Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN |
| title_short | Fed-DPSDG-WGAN: Differentially Private Synthetic Data Generation for Loan Default Prediction via Federated Wasserstein GAN |
| title_sort | fed dpsdg wgan differentially private synthetic data generation for loan default prediction via federated wasserstein gan |
| topic | Federated learning WGAN CTGAN differentially private data oversampling trivial auto-encoder |
| url | https://ieeexplore.ieee.org/document/10930895/ |
| work_keys_str_mv | AT padmajaramachandra feddpsdgwgandifferentiallyprivatesyntheticdatagenerationforloandefaultpredictionviafederatedwassersteingan AT santhivaithiyanathan feddpsdgwgandifferentiallyprivatesyntheticdatagenerationforloandefaultpredictionviafederatedwassersteingan |