Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.

Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models fo...

Full description

Saved in:
Bibliographic Details
Main Authors: Ekramul Haque Tusher, Mohd Arfian Ismail, Abdullah Akib, Lubna A Gabralla, Ashraf Osman Ibrahim, Hafizan Mat Som, Muhammad Akmal Remli
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0326488
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849427375668854784
author Ekramul Haque Tusher
Mohd Arfian Ismail
Abdullah Akib
Lubna A Gabralla
Ashraf Osman Ibrahim
Hafizan Mat Som
Muhammad Akmal Remli
author_facet Ekramul Haque Tusher
Mohd Arfian Ismail
Abdullah Akib
Lubna A Gabralla
Ashraf Osman Ibrahim
Hafizan Mat Som
Muhammad Akmal Remli
author_sort Ekramul Haque Tusher
collection DOAJ
description Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis.
format Article
id doaj-art-b1e2af4200434fc9b540d4fa557d46fa
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-b1e2af4200434fc9b540d4fa557d46fa2025-08-20T03:29:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01206e032648810.1371/journal.pone.0326488Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.Ekramul Haque TusherMohd Arfian IsmailAbdullah AkibLubna A GabrallaAshraf Osman IbrahimHafizan Mat SomMuhammad Akmal RemliAround 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis.https://doi.org/10.1371/journal.pone.0326488
spellingShingle Ekramul Haque Tusher
Mohd Arfian Ismail
Abdullah Akib
Lubna A Gabralla
Ashraf Osman Ibrahim
Hafizan Mat Som
Muhammad Akmal Remli
Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.
PLoS ONE
title Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.
title_full Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.
title_fullStr Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.
title_full_unstemmed Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.
title_short Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.
title_sort comparative investigation of bagging enhanced machine learning for early detection of hcv infections using class imbalance technique with feature selection
url https://doi.org/10.1371/journal.pone.0326488
work_keys_str_mv AT ekramulhaquetusher comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection
AT mohdarfianismail comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection
AT abdullahakib comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection
AT lubnaagabralla comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection
AT ashrafosmanibrahim comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection
AT hafizanmatsom comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection
AT muhammadakmalremli comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection