Robust Cross-Validation of Predictive Models Used in Credit Default Risk
Model validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/10/5495 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849711765530607616 |
|---|---|
| author | Jose Vicente Alonso Lorenzo Escot |
| author_facet | Jose Vicente Alonso Lorenzo Escot |
| author_sort | Jose Vicente Alonso |
| collection | DOAJ |
| description | Model validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of the validation techniques (covariate shift). While many methodologies have been developed, cross-validation is perhaps the most widely accepted, often being part of the model development process by optimizing the hyperparameters of predictive algorithms. This experimental research focuses on evaluating existing robust cross-validation variants to address the issues of validating credit default models. In addition, some improvements to those methods are proposed and compared with a wide range of validation techniques, including fuzzy methods. To reach solid and practical conclusions, this work limits its scope to logistic regression, as it is the best-practice modeling technique in real-world applications of this context. It is shown that robust cross-validation algorithms lead to more stable estimates, as expected due to the more homogeneous partitions, which have a positive impact on the selection of credit default models. In addition, the enhancements proposed to existing robust techniques lead to improved results when there are data restrictions. |
| format | Article |
| id | doaj-art-2868d94575ca4e12a1f3c6b24a28d775 |
| institution | DOAJ |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-2868d94575ca4e12a1f3c6b24a28d7752025-08-20T03:14:32ZengMDPI AGApplied Sciences2076-34172025-05-011510549510.3390/app15105495Robust Cross-Validation of Predictive Models Used in Credit Default RiskJose Vicente Alonso0Lorenzo Escot1Department of Applied Mathematics, National University of Distance Education (UNED), 28040 Madrid, SpainResearch Institute for Statistics and Data Science, Complutense University of Madrid (UCM), 28040 Madrid, SpainModel validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of the validation techniques (covariate shift). While many methodologies have been developed, cross-validation is perhaps the most widely accepted, often being part of the model development process by optimizing the hyperparameters of predictive algorithms. This experimental research focuses on evaluating existing robust cross-validation variants to address the issues of validating credit default models. In addition, some improvements to those methods are proposed and compared with a wide range of validation techniques, including fuzzy methods. To reach solid and practical conclusions, this work limits its scope to logistic regression, as it is the best-practice modeling technique in real-world applications of this context. It is shown that robust cross-validation algorithms lead to more stable estimates, as expected due to the more homogeneous partitions, which have a positive impact on the selection of credit default models. In addition, the enhancements proposed to existing robust techniques lead to improved results when there are data restrictions.https://www.mdpi.com/2076-3417/15/10/5495credit default modelsmodel selectioncross validationcovariate shiftimbalanced dataset |
| spellingShingle | Jose Vicente Alonso Lorenzo Escot Robust Cross-Validation of Predictive Models Used in Credit Default Risk Applied Sciences credit default models model selection cross validation covariate shift imbalanced dataset |
| title | Robust Cross-Validation of Predictive Models Used in Credit Default Risk |
| title_full | Robust Cross-Validation of Predictive Models Used in Credit Default Risk |
| title_fullStr | Robust Cross-Validation of Predictive Models Used in Credit Default Risk |
| title_full_unstemmed | Robust Cross-Validation of Predictive Models Used in Credit Default Risk |
| title_short | Robust Cross-Validation of Predictive Models Used in Credit Default Risk |
| title_sort | robust cross validation of predictive models used in credit default risk |
| topic | credit default models model selection cross validation covariate shift imbalanced dataset |
| url | https://www.mdpi.com/2076-3417/15/10/5495 |
| work_keys_str_mv | AT josevicentealonso robustcrossvalidationofpredictivemodelsusedincreditdefaultrisk AT lorenzoescot robustcrossvalidationofpredictivemodelsusedincreditdefaultrisk |