Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model

The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection technique...

Full description

Saved in:
Bibliographic Details
Main Authors: Marghany Hassan Mohamed, Botheina Hussein Ali, Ahmed Ibrahim Taloba, Ahmad O. Aseeri, Mohamed Abd Elaziz, Shaker El-Sappagah, Nora Mahmoud El-Rashidy
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10680897/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850165452446105600
author Marghany Hassan Mohamed
Botheina Hussein Ali
Ahmed Ibrahim Taloba
Ahmad O. Aseeri
Mohamed Abd Elaziz
Shaker El-Sappagah
Nora Mahmoud El-Rashidy
author_facet Marghany Hassan Mohamed
Botheina Hussein Ali
Ahmed Ibrahim Taloba
Ahmad O. Aseeri
Mohamed Abd Elaziz
Shaker El-Sappagah
Nora Mahmoud El-Rashidy
author_sort Marghany Hassan Mohamed
collection DOAJ
description The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88% and 94.12%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01%), Precision (94.44%), Recall (94.25%), F1-score (94.01%), and area under the ROC curve (94.25%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease.
format Article
id doaj-art-3f73bca9f0d94533ad3e35a6fad6923d
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-3f73bca9f0d94533ad3e35a6fad6923d2025-08-20T02:21:45ZengIEEEIEEE Access2169-35362024-01-011218021018023710.1109/ACCESS.2024.345942910680897Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking ModelMarghany Hassan Mohamed0Botheina Hussein Ali1https://orcid.org/0009-0008-3635-5678Ahmed Ibrahim Taloba2Ahmad O. Aseeri3https://orcid.org/0000-0002-4234-4069Mohamed Abd Elaziz4https://orcid.org/0000-0002-7682-6269Shaker El-Sappagah5https://orcid.org/0000-0001-9705-1477Nora Mahmoud El-Rashidy6https://orcid.org/0000-0001-8177-9439Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, EgyptDepartment of Information System, Faculty of Computers and Information, Arish University, Arish, EgyptDepartment of Information System, Faculty of Computers and Information, Assiut University, Asyut, EgyptDepartment of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi ArabiaFaculty of Computer Science and Engineering, Galala University, Suez, EgyptFaculty of Computer Science and Engineering, Galala University, Suez, EgyptDepartment of Machine Learning and Information Retrieval, Faculty of Artificial Intelligence, Kafrelsheikh University, Kafr El-Shaikh, EgyptThe difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88% and 94.12%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01%), Precision (94.44%), Recall (94.25%), F1-score (94.01%), and area under the ROC curve (94.25%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease.https://ieeexplore.ieee.org/document/10680897/Ensemble stackingfeature selectionILPD datasetliver disease predictionmachine learning
spellingShingle Marghany Hassan Mohamed
Botheina Hussein Ali
Ahmed Ibrahim Taloba
Ahmad O. Aseeri
Mohamed Abd Elaziz
Shaker El-Sappagah
Nora Mahmoud El-Rashidy
Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model
IEEE Access
Ensemble stacking
feature selection
ILPD dataset
liver disease prediction
machine learning
title Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model
title_full Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model
title_fullStr Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model
title_full_unstemmed Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model
title_short Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model
title_sort toward an accurate liver disease prediction based on two level ensemble stacking model
topic Ensemble stacking
feature selection
ILPD dataset
liver disease prediction
machine learning
url https://ieeexplore.ieee.org/document/10680897/
work_keys_str_mv AT marghanyhassanmohamed towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel
AT botheinahusseinali towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel
AT ahmedibrahimtaloba towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel
AT ahmadoaseeri towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel
AT mohamedabdelaziz towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel
AT shakerelsappagah towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel
AT noramahmoudelrashidy towardanaccurateliverdiseasepredictionbasedontwolevelensemblestackingmodel