Machine Learning for Chinese Corporate Fraud Prediction: Segmented Models Based on Optimal Training Windows

We propose a comprehensive and practical framework for Chinese corporate fraud prediction which incorporates classifiers, class imbalance, population drift, segmented models, and model evaluation using machine learning algorithms. Based on a three-stage experiment, we first find that the random fore...

Full description

Saved in:
Bibliographic Details
Main Authors: Chang Chuan Goh, Yue Yang, Anthony Bellotti, Xiuping Hua
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/5/397
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We propose a comprehensive and practical framework for Chinese corporate fraud prediction which incorporates classifiers, class imbalance, population drift, segmented models, and model evaluation using machine learning algorithms. Based on a three-stage experiment, we first find that the random forest classifier has the best performance in predicting corporate fraud among 17 machine learning models. We then implement the sliding time window approach to handle population drift, and the optimal training window found demonstrates the existence of population drift in fraud detection and the need to address it for improved model performance. Using the best machine learning model and optimal training window, we build general model and segmented models to compare fraud types and industries based on their respective predictive performance via four evaluation metrics and top features using SHAP. The results indicate that segmented models have a better predictive performance than the general model for fraud types with low fraud rates and are as good as the general model for most industries when controlling for training set size. The dissimilarities between the top features set of the general and segmented models suggest that segmented models are useful in providing a better understanding of fraud occurrence.
ISSN:2078-2489