Robust Cross-Validation of Predictive Models Used in Credit Default Risk

Model validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of...

Full description

Saved in:
Bibliographic Details
Main Authors: Jose Vicente Alonso, Lorenzo Escot
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/10/5495
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849711765530607616
author Jose Vicente Alonso
Lorenzo Escot
author_facet Jose Vicente Alonso
Lorenzo Escot
author_sort Jose Vicente Alonso
collection DOAJ
description Model validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of the validation techniques (covariate shift). While many methodologies have been developed, cross-validation is perhaps the most widely accepted, often being part of the model development process by optimizing the hyperparameters of predictive algorithms. This experimental research focuses on evaluating existing robust cross-validation variants to address the issues of validating credit default models. In addition, some improvements to those methods are proposed and compared with a wide range of validation techniques, including fuzzy methods. To reach solid and practical conclusions, this work limits its scope to logistic regression, as it is the best-practice modeling technique in real-world applications of this context. It is shown that robust cross-validation algorithms lead to more stable estimates, as expected due to the more homogeneous partitions, which have a positive impact on the selection of credit default models. In addition, the enhancements proposed to existing robust techniques lead to improved results when there are data restrictions.
format Article
id doaj-art-2868d94575ca4e12a1f3c6b24a28d775
institution DOAJ
issn 2076-3417
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-2868d94575ca4e12a1f3c6b24a28d7752025-08-20T03:14:32ZengMDPI AGApplied Sciences2076-34172025-05-011510549510.3390/app15105495Robust Cross-Validation of Predictive Models Used in Credit Default RiskJose Vicente Alonso0Lorenzo Escot1Department of Applied Mathematics, National University of Distance Education (UNED), 28040 Madrid, SpainResearch Institute for Statistics and Data Science, Complutense University of Madrid (UCM), 28040 Madrid, SpainModel validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of the validation techniques (covariate shift). While many methodologies have been developed, cross-validation is perhaps the most widely accepted, often being part of the model development process by optimizing the hyperparameters of predictive algorithms. This experimental research focuses on evaluating existing robust cross-validation variants to address the issues of validating credit default models. In addition, some improvements to those methods are proposed and compared with a wide range of validation techniques, including fuzzy methods. To reach solid and practical conclusions, this work limits its scope to logistic regression, as it is the best-practice modeling technique in real-world applications of this context. It is shown that robust cross-validation algorithms lead to more stable estimates, as expected due to the more homogeneous partitions, which have a positive impact on the selection of credit default models. In addition, the enhancements proposed to existing robust techniques lead to improved results when there are data restrictions.https://www.mdpi.com/2076-3417/15/10/5495credit default modelsmodel selectioncross validationcovariate shiftimbalanced dataset
spellingShingle Jose Vicente Alonso
Lorenzo Escot
Robust Cross-Validation of Predictive Models Used in Credit Default Risk
Applied Sciences
credit default models
model selection
cross validation
covariate shift
imbalanced dataset
title Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_full Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_fullStr Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_full_unstemmed Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_short Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_sort robust cross validation of predictive models used in credit default risk
topic credit default models
model selection
cross validation
covariate shift
imbalanced dataset
url https://www.mdpi.com/2076-3417/15/10/5495
work_keys_str_mv AT josevicentealonso robustcrossvalidationofpredictivemodelsusedincreditdefaultrisk
AT lorenzoescot robustcrossvalidationofpredictivemodelsusedincreditdefaultrisk