Credit Risk Modeling Using Interpreted XGBoost
Purpose: The aim of the paper is to develop a credit risk assessment model usingb the XGBoost classifier supported by interpretation issues. Design/methodology/approach: The risk modeling is based on Extreme Gradient Boosting (XGBoost) in the research. It is a method used for regression and class...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
University of Warsaw
2023-01-01
|
Series: | European Management Studies |
Subjects: | |
Online Access: | https://press.wz.uw.edu.pl/ems/vol21/iss3/3 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Purpose: The aim of the paper is to develop a credit risk assessment model usingb the XGBoost classifier
supported by interpretation issues.
Design/methodology/approach: The risk modeling is based on Extreme Gradient Boosting (XGBoost) in
the research. It is a method used for regression and classification problems. It is based on a sequence
of decision trees using a gradient-based optimization method of the loss function to minimize the errors
of weak estimators. We use also methods for performing local and global interpretability: ceteris paribus
charts, SHAP and feature importance approach.
Findings: Based on the research results, it can be concluded that XGBoost achieved higher values of performance
metrics than logistic regression, except sensitivity. It means that XGBoost indicated a smaller percentage of all
bad client. Results of local interpretability enable a conclusion that in the case of the client in question, the credit
decision is positively influenced by credit scores from external suppliers, while it is negatively influenced by
minimal external scoring and short seniority. The number of years in the car and higher education are also
positive. Such information helps to justify a negative credit decision. Results of global interpretability enable
a conclusion that higher values of the traits associated with the z-scores are accompanied by negative Shapley
values, which can be interpreted as a negative effect on the explanatory variable.
Research limitations/implications: XGBoost, A ceteris paribus plot, SHAP, and feature importance methods
can be used to develop a credit risk assessment model including machine learning interpretability. The
main limitation of research is to compare the results of XGBoost only to the logistic regression results.
Future research should focus on comparing the results of XGBoost to other machine learning methods,
including neural networks.
Originality/value: One of the key processes in a bank is the credit decision process, which is the evaluation
of a client’s repayment risk. In the consumer finance sector, the processes are usually largely
automated, and increasingly the latest machine learning methods based on neural networks and ensemble
learning methods are being used for the purpose. Although machine learning models allow for achieving
higher accuracy of credit risk assessment compared to traditional statistical methods, the main problem
is the low interpretability of machine learning models. The models often perform as the “black box”.
However, the interpretation of the results of risk assessment models is very important due to the need
to explain to the client the reasons for assessing their credit risk. |
---|---|
ISSN: | 2956-7602 |