Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data
High-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-en...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-08-01
|
| Series: | Mathematics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7390/13/15/2526 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849405999321972736 |
|---|---|
| author | Yanhe Wang Wei Wei Zhuodong Liu Jiahe Liu Yinzhen Lv Xiangyu Li |
| author_facet | Yanhe Wang Wei Wei Zhuodong Liu Jiahe Liu Yinzhen Lv Xiangyu Li |
| author_sort | Yanhe Wang |
| collection | DOAJ |
| description | High-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-enhanced machine learning framework that integrates SHAP with advanced ensemble methods for interpretable financialization prediction. The methodology simultaneously addresses high-dimensional feature selection using 40 independent variables (19 CSR-related and 21 financialization-related), multicollinearity issues, and model interpretability requirements. Using a comprehensive dataset of 25,642 observations from 3776 Chinese A-share companies (2011–2022), we implement nine optimized machine learning algorithms with hyperparameter tuning via the Hippopotamus Optimization algorithm and five-fold cross-validation. XGBoost demonstrates superior performance with 99.34% explained variance, achieving an RMSE of 0.082 and R<sup>2</sup> of 0.299. SHAP analysis reveals non-linear U-shaped relationships between key predictors and financialization outcomes, with critical thresholds at approximately 10 for CSR_SocR, 1.5 for CSR_S, and 5 for CSR_CV. SOE status, EPU, ownership concentration, firm size, and housing prices emerge as the most influential predictors. Notable shifts in factor importance occur during the COVID-19 pandemic period (2020–2022). This work contributes a scalable, interpretable machine learning architecture for high-dimensional financial prediction problems, with applications in risk assessment, portfolio optimization, and regulatory monitoring systems. |
| format | Article |
| id | doaj-art-cfb09ff7ceb0495a8a7b3ce5e5e628bc |
| institution | Kabale University |
| issn | 2227-7390 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Mathematics |
| spelling | doaj-art-cfb09ff7ceb0495a8a7b3ce5e5e628bc2025-08-20T03:36:31ZengMDPI AGMathematics2227-73902025-08-011315252610.3390/math13152526Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional DataYanhe Wang0Wei Wei1Zhuodong Liu2Jiahe Liu3Yinzhen Lv4Xiangyu Li5School of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Economics and Management, Beijing Jiaotong University, Beijing 100044, ChinaDepartment of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaHigh-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-enhanced machine learning framework that integrates SHAP with advanced ensemble methods for interpretable financialization prediction. The methodology simultaneously addresses high-dimensional feature selection using 40 independent variables (19 CSR-related and 21 financialization-related), multicollinearity issues, and model interpretability requirements. Using a comprehensive dataset of 25,642 observations from 3776 Chinese A-share companies (2011–2022), we implement nine optimized machine learning algorithms with hyperparameter tuning via the Hippopotamus Optimization algorithm and five-fold cross-validation. XGBoost demonstrates superior performance with 99.34% explained variance, achieving an RMSE of 0.082 and R<sup>2</sup> of 0.299. SHAP analysis reveals non-linear U-shaped relationships between key predictors and financialization outcomes, with critical thresholds at approximately 10 for CSR_SocR, 1.5 for CSR_S, and 5 for CSR_CV. SOE status, EPU, ownership concentration, firm size, and housing prices emerge as the most influential predictors. Notable shifts in factor importance occur during the COVID-19 pandemic period (2020–2022). This work contributes a scalable, interpretable machine learning architecture for high-dimensional financial prediction problems, with applications in risk assessment, portfolio optimization, and regulatory monitoring systems.https://www.mdpi.com/2227-7390/13/15/2526machine learningSHAP interpretabilityfinancial prediction modelinghigh-dimensional data analysiscorporate social responsibilityU-shaped relationships |
| spellingShingle | Yanhe Wang Wei Wei Zhuodong Liu Jiahe Liu Yinzhen Lv Xiangyu Li Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data Mathematics machine learning SHAP interpretability financial prediction modeling high-dimensional data analysis corporate social responsibility U-shaped relationships |
| title | Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data |
| title_full | Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data |
| title_fullStr | Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data |
| title_full_unstemmed | Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data |
| title_short | Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data |
| title_sort | interpretable machine learning framework for corporate financialization prediction a shap based analysis of high dimensional data |
| topic | machine learning SHAP interpretability financial prediction modeling high-dimensional data analysis corporate social responsibility U-shaped relationships |
| url | https://www.mdpi.com/2227-7390/13/15/2526 |
| work_keys_str_mv | AT yanhewang interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata AT weiwei interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata AT zhuodongliu interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata AT jiaheliu interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata AT yinzhenlv interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata AT xiangyuli interpretablemachinelearningframeworkforcorporatefinancializationpredictionashapbasedanalysisofhighdimensionaldata |