Robust Cross-Validation of Predictive Models Used in Credit Default Risk

Model validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of...

Full description

Saved in:
Bibliographic Details
Main Authors: Jose Vicente Alonso, Lorenzo Escot
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/10/5495
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Model validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of the validation techniques (covariate shift). While many methodologies have been developed, cross-validation is perhaps the most widely accepted, often being part of the model development process by optimizing the hyperparameters of predictive algorithms. This experimental research focuses on evaluating existing robust cross-validation variants to address the issues of validating credit default models. In addition, some improvements to those methods are proposed and compared with a wide range of validation techniques, including fuzzy methods. To reach solid and practical conclusions, this work limits its scope to logistic regression, as it is the best-practice modeling technique in real-world applications of this context. It is shown that robust cross-validation algorithms lead to more stable estimates, as expected due to the more homogeneous partitions, which have a positive impact on the selection of credit default models. In addition, the enhancements proposed to existing robust techniques lead to improved results when there are data restrictions.
ISSN:2076-3417