Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data

High-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-en...

Full description

Saved in:
Bibliographic Details
Main Authors: Yanhe Wang, Wei Wei, Zhuodong Liu, Jiahe Liu, Yinzhen Lv, Xiangyu Li
Format: Article
Language:English
Published: MDPI AG 2025-08-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/15/2526
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849405999321972736
author Yanhe Wang
Wei Wei
Zhuodong Liu
Jiahe Liu
Yinzhen Lv
Xiangyu Li
author_facet Yanhe Wang
Wei Wei
Zhuodong Liu
Jiahe Liu
Yinzhen Lv
Xiangyu Li
author_sort Yanhe Wang
collection DOAJ
description High-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-enhanced machine learning framework that integrates SHAP with advanced ensemble methods for interpretable financialization prediction. The methodology simultaneously addresses high-dimensional feature selection using 40 independent variables (19 CSR-related and 21 financialization-related), multicollinearity issues, and model interpretability requirements. Using a comprehensive dataset of 25,642 observations from 3776 Chinese A-share companies (2011–2022), we implement nine optimized machine learning algorithms with hyperparameter tuning via the Hippopotamus Optimization algorithm and five-fold cross-validation. XGBoost demonstrates superior performance with 99.34% explained variance, achieving an RMSE of 0.082 and R<sup>2</sup> of 0.299. SHAP analysis reveals non-linear U-shaped relationships between key predictors and financialization outcomes, with critical thresholds at approximately 10 for CSR_SocR, 1.5 for CSR_S, and 5 for CSR_CV. SOE status, EPU, ownership concentration, firm size, and housing prices emerge as the most influential predictors. Notable shifts in factor importance occur during the COVID-19 pandemic period (2020–2022). This work contributes a scalable, interpretable machine learning architecture for high-dimensional financial prediction problems, with applications in risk assessment, portfolio optimization, and regulatory monitoring systems.
format Article
id doaj-art-cfb09ff7ceb0495a8a7b3ce5e5e628bc
institution Kabale University
issn 2227-7390
language English
publishDate 2025-08-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-cfb09ff7ceb0495a8a7b3ce5e5e628bc2025-08-20T03:36:31ZengMDPI AGMathematics2227-73902025-08-011315252610.3390/math13152526Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional DataYanhe Wang0Wei Wei1Zhuodong Liu2Jiahe Liu3Yinzhen Lv4Xiangyu Li5School of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaDepartment of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaHigh-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-enhanced machine learning framework that integrates SHAP with advanced ensemble methods for interpretable financialization prediction. The methodology simultaneously addresses high-dimensional feature selection using 40 independent variables (19 CSR-related and 21 financialization-related), multicollinearity issues, and model interpretability requirements. Using a comprehensive dataset of 25,642 observations from 3776 Chinese A-share companies (2011–2022), we implement nine optimized machine learning algorithms with hyperparameter tuning via the Hippopotamus Optimization algorithm and five-fold cross-validation. XGBoost demonstrates superior performance with 99.34% explained variance, achieving an RMSE of 0.082 and R<sup>2</sup> of 0.299. SHAP analysis reveals non-linear U-shaped relationships between key predictors and financialization outcomes, with critical thresholds at approximately 10 for CSR_SocR, 1.5 for CSR_S, and 5 for CSR_CV. SOE status, EPU, ownership concentration, firm size, and housing prices emerge as the most influential predictors. Notable shifts in factor importance occur during the COVID-19 pandemic period (2020–2022). This work contributes a scalable, interpretable machine learning architecture for high-dimensional financial prediction problems, with applications in risk assessment, portfolio optimization, and regulatory monitoring systems.https://www.mdpi.com/2227-7390/13/15/2526machine learningSHAP interpretabilityfinancial prediction modelinghigh-dimensional data analysiscorporate social responsibilityU-shaped relationships
spellingShingle Yanhe Wang
Wei Wei
Zhuodong Liu
Jiahe Liu
Yinzhen Lv
Xiangyu Li
Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data
Mathematics
machine learning
SHAP interpretability
financial prediction modeling
high-dimensional data analysis
corporate social responsibility
U-shaped relationships
title Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data
title_full Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data
title_fullStr Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data
title_full_unstemmed Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data
title_short Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data
title_sort interpretable machine learning framework for corporate financialization prediction a shap based analysis of high dimensional data
topic machine learning
SHAP interpretability
financial prediction modeling
high-dimensional data analysis
corporate social responsibility
U-shaped relationships
url https://www.mdpi.com/2227-7390/13/15/2526
work_keys_str_mv AT yanhewang interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata
AT weiwei interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata
AT zhuodongliu interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata
AT jiaheliu interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata
AT yinzhenlv interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata
AT xiangyuli interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata