Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.
Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models fo...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0326488 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849427375668854784 |
|---|---|
| author | Ekramul Haque Tusher Mohd Arfian Ismail Abdullah Akib Lubna A Gabralla Ashraf Osman Ibrahim Hafizan Mat Som Muhammad Akmal Remli |
| author_facet | Ekramul Haque Tusher Mohd Arfian Ismail Abdullah Akib Lubna A Gabralla Ashraf Osman Ibrahim Hafizan Mat Som Muhammad Akmal Remli |
| author_sort | Ekramul Haque Tusher |
| collection | DOAJ |
| description | Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis. |
| format | Article |
| id | doaj-art-b1e2af4200434fc9b540d4fa557d46fa |
| institution | Kabale University |
| issn | 1932-6203 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Public Library of Science (PLoS) |
| record_format | Article |
| series | PLoS ONE |
| spelling | doaj-art-b1e2af4200434fc9b540d4fa557d46fa2025-08-20T03:29:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01206e032648810.1371/journal.pone.0326488Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.Ekramul Haque TusherMohd Arfian IsmailAbdullah AkibLubna A GabrallaAshraf Osman IbrahimHafizan Mat SomMuhammad Akmal RemliAround 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis.https://doi.org/10.1371/journal.pone.0326488 |
| spellingShingle | Ekramul Haque Tusher Mohd Arfian Ismail Abdullah Akib Lubna A Gabralla Ashraf Osman Ibrahim Hafizan Mat Som Muhammad Akmal Remli Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection. PLoS ONE |
| title | Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection. |
| title_full | Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection. |
| title_fullStr | Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection. |
| title_full_unstemmed | Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection. |
| title_short | Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection. |
| title_sort | comparative investigation of bagging enhanced machine learning for early detection of hcv infections using class imbalance technique with feature selection |
| url | https://doi.org/10.1371/journal.pone.0326488 |
| work_keys_str_mv | AT ekramulhaquetusher comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection AT mohdarfianismail comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection AT abdullahakib comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection AT lubnaagabralla comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection AT ashrafosmanibrahim comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection AT hafizanmatsom comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection AT muhammadakmalremli comparativeinvestigationofbaggingenhancedmachinelearningforearlydetectionofhcvinfectionsusingclassimbalancetechniquewithfeatureselection |