NOTE: non-parametric oversampling technique for explainable credit scoring

Abstract Credit scoring models are critical for financial institutions to assess borrower risk and maintain profitability. Although machine learning models have improved credit scoring accuracy, imbalanced class distributions remain a major challenge. The widely used Synthetic Minority Oversampling...

Full description

Saved in:
Bibliographic Details
Main Authors: Seongil Han, Haemin Jung, Paul D. Yoo, Alessandro Provetti, Andrea Cali
Format: Article
Language:English
Published: Nature Portfolio 2024-10-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-78055-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850179865838354432
author Seongil Han
Haemin Jung
Paul D. Yoo
Alessandro Provetti
Andrea Cali
author_facet Seongil Han
Haemin Jung
Paul D. Yoo
Alessandro Provetti
Andrea Cali
author_sort Seongil Han
collection DOAJ
description Abstract Credit scoring models are critical for financial institutions to assess borrower risk and maintain profitability. Although machine learning models have improved credit scoring accuracy, imbalanced class distributions remain a major challenge. The widely used Synthetic Minority Oversampling TEchnique (SMOTE) struggles with high-dimensional, non-linear data and may introduce noise through class overlap. Generative Adversarial Networks (GANs) have emerged as an alternative, offering the ability to model complex data distributions. Conditional Wasserstein GANs (cWGANs) have shown promise in handling both numerical and categorical features in credit scoring datasets. However, research on extracting latent features from non-linear data and improving model explainability remains limited. To address these challenges, this paper introduces the Non-parametric Oversampling Technique for Explainable credit scoring (NOTE). The NOTE offers a unified approach that integrates a Non-parametric Stacked Autoencoder (NSA) for capturing non-linear latent features, cWGAN for oversampling the minority class, and a classification process designed to enhance explainability. The experimental results demonstrate that NOTE surpasses state-of-the-art oversampling techniques by improving classification accuracy and model stability, particularly in non-linear and imbalanced credit scoring datasets, while also enhancing the explainability of the results.
format Article
id doaj-art-6f51d7d4f3b44c48a47e17c0fd3ddfc0
institution OA Journals
issn 2045-2322
language English
publishDate 2024-10-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-6f51d7d4f3b44c48a47e17c0fd3ddfc02025-08-20T02:18:24ZengNature PortfolioScientific Reports2045-23222024-10-0114111810.1038/s41598-024-78055-5NOTE: non-parametric oversampling technique for explainable credit scoringSeongil Han0Haemin Jung1Paul D. Yoo2Alessandro Provetti3Andrea Cali4School of Computing & Mathematical Sciences, University of London, Birkbeck CollegeDepartment of Industrial & Management Engineering, Korea National University of TransportationSchool of Computing & Mathematical Sciences, University of London, Birkbeck CollegeSchool of Computing & Mathematical Sciences, University of London, Birkbeck CollegeSchool of Computing & Mathematical Sciences, University of London, Birkbeck CollegeAbstract Credit scoring models are critical for financial institutions to assess borrower risk and maintain profitability. Although machine learning models have improved credit scoring accuracy, imbalanced class distributions remain a major challenge. The widely used Synthetic Minority Oversampling TEchnique (SMOTE) struggles with high-dimensional, non-linear data and may introduce noise through class overlap. Generative Adversarial Networks (GANs) have emerged as an alternative, offering the ability to model complex data distributions. Conditional Wasserstein GANs (cWGANs) have shown promise in handling both numerical and categorical features in credit scoring datasets. However, research on extracting latent features from non-linear data and improving model explainability remains limited. To address these challenges, this paper introduces the Non-parametric Oversampling Technique for Explainable credit scoring (NOTE). The NOTE offers a unified approach that integrates a Non-parametric Stacked Autoencoder (NSA) for capturing non-linear latent features, cWGAN for oversampling the minority class, and a classification process designed to enhance explainability. The experimental results demonstrate that NOTE surpasses state-of-the-art oversampling techniques by improving classification accuracy and model stability, particularly in non-linear and imbalanced credit scoring datasets, while also enhancing the explainability of the results.https://doi.org/10.1038/s41598-024-78055-5Conditional Wasserstein generative adversarial networksStacked autoencoderExplainable AIImbalanced classOversamplingCredit scoring
spellingShingle Seongil Han
Haemin Jung
Paul D. Yoo
Alessandro Provetti
Andrea Cali
NOTE: non-parametric oversampling technique for explainable credit scoring
Scientific Reports
Conditional Wasserstein generative adversarial networks
Stacked autoencoder
Explainable AI
Imbalanced class
Oversampling
Credit scoring
title NOTE: non-parametric oversampling technique for explainable credit scoring
title_full NOTE: non-parametric oversampling technique for explainable credit scoring
title_fullStr NOTE: non-parametric oversampling technique for explainable credit scoring
title_full_unstemmed NOTE: non-parametric oversampling technique for explainable credit scoring
title_short NOTE: non-parametric oversampling technique for explainable credit scoring
title_sort note non parametric oversampling technique for explainable credit scoring
topic Conditional Wasserstein generative adversarial networks
Stacked autoencoder
Explainable AI
Imbalanced class
Oversampling
Credit scoring
url https://doi.org/10.1038/s41598-024-78055-5
work_keys_str_mv AT seongilhan notenonparametricoversamplingtechniqueforexplainablecreditscoring
AT haeminjung notenonparametricoversamplingtechniqueforexplainablecreditscoring
AT pauldyoo notenonparametricoversamplingtechniqueforexplainablecreditscoring
AT alessandroprovetti notenonparametricoversamplingtechniqueforexplainablecreditscoring
AT andreacali notenonparametricoversamplingtechniqueforexplainablecreditscoring