Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model
The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection technique...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10680897/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850165452446105600 |
|---|---|
| author | Marghany Hassan Mohamed Botheina Hussein Ali Ahmed Ibrahim Taloba Ahmad O. Aseeri Mohamed Abd Elaziz Shaker El-Sappagah Nora Mahmoud El-Rashidy |
| author_facet | Marghany Hassan Mohamed Botheina Hussein Ali Ahmed Ibrahim Taloba Ahmad O. Aseeri Mohamed Abd Elaziz Shaker El-Sappagah Nora Mahmoud El-Rashidy |
| author_sort | Marghany Hassan Mohamed |
| collection | DOAJ |
| description | The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88% and 94.12%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01%), Precision (94.44%), Recall (94.25%), F1-score (94.01%), and area under the ROC curve (94.25%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease. |
| format | Article |
| id | doaj-art-3f73bca9f0d94533ad3e35a6fad6923d |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-3f73bca9f0d94533ad3e35a6fad6923d2025-08-20T02:21:45ZengIEEEIEEE Access2169-35362024-01-011218021018023710.1109/ACCESS.2024.345942910680897Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking ModelMarghany Hassan Mohamed0Botheina Hussein Ali1https://orcid.org/0009-0008-3635-5678Ahmed Ibrahim Taloba2Ahmad O. Aseeri3https://orcid.org/0000-0002-4234-4069Mohamed Abd Elaziz4https://orcid.org/0000-0002-7682-6269Shaker El-Sappagah5https://orcid.org/0000-0001-9705-1477Nora Mahmoud El-Rashidy6https://orcid.org/0000-0001-8177-9439Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, EgyptDepartment of Information System, Faculty of Computers and Information, Arish University, Arish, EgyptDepartment of Information System, Faculty of Computers and Information, Assiut University, Asyut, EgyptDepartment of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi ArabiaFaculty of Computer Science and Engineering, Galala University, Suez, EgyptFaculty of Computer Science and Engineering, Galala University, Suez, EgyptDepartment of Machine Learning and Information Retrieval, Faculty of Artificial Intelligence, Kafrelsheikh University, Kafr El-Shaikh, EgyptThe difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88% and 94.12%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01%), Precision (94.44%), Recall (94.25%), F1-score (94.01%), and area under the ROC curve (94.25%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease.https://ieeexplore.ieee.org/document/10680897/Ensemble stackingfeature selectionILPD datasetliver disease predictionmachine learning |
| spellingShingle | Marghany Hassan Mohamed Botheina Hussein Ali Ahmed Ibrahim Taloba Ahmad O. Aseeri Mohamed Abd Elaziz Shaker El-Sappagah Nora Mahmoud El-Rashidy Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model IEEE Access Ensemble stacking feature selection ILPD dataset liver disease prediction machine learning |
| title | Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model |
| title_full | Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model |
| title_fullStr | Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model |
| title_full_unstemmed | Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model |
| title_short | Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model |
| title_sort | toward an accurate liver disease prediction based on two level ensemble stacking model |
| topic | Ensemble stacking feature selection ILPD dataset liver disease prediction machine learning |
| url | https://ieeexplore.ieee.org/document/10680897/ |
| work_keys_str_mv | AT marghanyhassanmohamed towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel AT botheinahusseinali towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel AT ahmedibrahimtaloba towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel AT ahmadoaseeri towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel AT mohamedabdelaziz towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel AT shakerelsappagah towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel AT noramahmoudelrashidy towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel |