Research on parameter selection and optimization of C4.5 algorithm based on algorithm applicability knowledge base

Abstract Given that the decision tree C4.5 algorithm has outstanding performance in prediction accuracy on medical datasets and is highly interpretable, this paper carries out an optimization study on the selection of hyperparameters of the algorithm in order to achieve fast and accurate optimizatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Yiyan Zhang, Yi Xin, Qin Li
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-11901-2
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Given that the decision tree C4.5 algorithm has outstanding performance in prediction accuracy on medical datasets and is highly interpretable, this paper carries out an optimization study on the selection of hyperparameters of the algorithm in order to achieve fast and accurate optimization of the algorithm model. The decision tree models are first constructed by taking different values of hyperparameters, and then the performance of each model is evaluated, and then the evaluated data are associated and integrated with the character metadata of the dataset. Three evaluation values of accuracy, AUC and F1-measure and 293 basic data sets were used to build a meta-database of hyperparameter M optimization required by the study. And then the range of values of C4.5 algorithm hyperparameters corresponding to the different character datasets are recommended through the modeling learning. The results show that for more than 65% of the data sets, there is no need to tune the hyperparameter M, which can avoid the waste of time caused by unnecessary tuning. The accuracy rate of the hyperparameter optimization value judgment model obtained in this study can reach more than 80%. The test and evaluation results verify the feasibility of the optimized hyperparameter value recommendation, which provides an important basis for the fast tuning and optimization of the C4.5 algorithm parameters.
ISSN:2045-2322