Robust Cross-Validation of Predictive Models Used in Credit Default Risk

Model validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jose Vicente Alonso, Lorenzo Escot
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Applied Sciences
Subjects:	credit default models model selection cross validation covariate shift imbalanced dataset
Online Access:	https://www.mdpi.com/2076-3417/15/10/5495
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849711765530607616
author	Jose Vicente Alonso Lorenzo Escot
author_facet	Jose Vicente Alonso Lorenzo Escot
author_sort	Jose Vicente Alonso
collection	DOAJ
description	Model validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of the validation techniques (covariate shift). While many methodologies have been developed, cross-validation is perhaps the most widely accepted, often being part of the model development process by optimizing the hyperparameters of predictive algorithms. This experimental research focuses on evaluating existing robust cross-validation variants to address the issues of validating credit default models. In addition, some improvements to those methods are proposed and compared with a wide range of validation techniques, including fuzzy methods. To reach solid and practical conclusions, this work limits its scope to logistic regression, as it is the best-practice modeling technique in real-world applications of this context. It is shown that robust cross-validation algorithms lead to more stable estimates, as expected due to the more homogeneous partitions, which have a positive impact on the selection of credit default models. In addition, the enhancements proposed to existing robust techniques lead to improved results when there are data restrictions.
format	Article
id	doaj-art-2868d94575ca4e12a1f3c6b24a28d775
institution	DOAJ
issn	2076-3417
language	English
publishDate	2025-05-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-2868d94575ca4e12a1f3c6b24a28d7752025-08-20T03:14:32ZengMDPI AGApplied Sciences2076-34172025-05-011510549510.3390/app15105495Robust Cross-Validation of Predictive Models Used in Credit Default RiskJose Vicente Alonso0Lorenzo Escot1Department of Applied Mathematics, National University of Distance Education (UNED), 28040 Madrid, SpainResearch Institute for Statistics and Data Science, Complutense University of Madrid (UCM), 28040 Madrid, SpainModel validation is a challenging Machine Learning task, usually more difficult for consumer credit default models because of the availability of small datasets, the modeling of low-frequency events (imbalanced data), and the bias in the explanatory variables induced by the train/test sets split of the validation techniques (covariate shift). While many methodologies have been developed, cross-validation is perhaps the most widely accepted, often being part of the model development process by optimizing the hyperparameters of predictive algorithms. This experimental research focuses on evaluating existing robust cross-validation variants to address the issues of validating credit default models. In addition, some improvements to those methods are proposed and compared with a wide range of validation techniques, including fuzzy methods. To reach solid and practical conclusions, this work limits its scope to logistic regression, as it is the best-practice modeling technique in real-world applications of this context. It is shown that robust cross-validation algorithms lead to more stable estimates, as expected due to the more homogeneous partitions, which have a positive impact on the selection of credit default models. In addition, the enhancements proposed to existing robust techniques lead to improved results when there are data restrictions.https://www.mdpi.com/2076-3417/15/10/5495credit default modelsmodel selectioncross validationcovariate shiftimbalanced dataset
spellingShingle	Jose Vicente Alonso Lorenzo Escot Robust Cross-Validation of Predictive Models Used in Credit Default Risk Applied Sciences credit default models model selection cross validation covariate shift imbalanced dataset
title	Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_full	Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_fullStr	Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_full_unstemmed	Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_short	Robust Cross-Validation of Predictive Models Used in Credit Default Risk
title_sort	robust cross validation of predictive models used in credit default risk
topic	credit default models model selection cross validation covariate shift imbalanced dataset
url	https://www.mdpi.com/2076-3417/15/10/5495
work_keys_str_mv	AT josevicentealonso robustcrossvalidationofpredictivemodelsusedincreditdefaultrisk AT lorenzoescot robustcrossvalidationofpredictivemodelsusedincreditdefaultrisk

Robust Cross-Validation of Predictive Models Used in Credit Default Risk

Similar Items