Prediction of hepatitis-C virus using statistical learning models
Abstract The hepatitis-c virus (HCV) is a viral infection that targets the liver and has emerged as a significant global health concern. This study investigates the classification of HCV patients by identifying the potential factors crucial for the progression and early detection of HCV. The study i...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Discover Public Health |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12982-025-00654-y |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850105607841906688 |
|---|---|
| author | Shalini Kumari Subhajit Das Prashant Kumar Sonker Agni Saroj Mukesh Kumar |
| author_facet | Shalini Kumari Subhajit Das Prashant Kumar Sonker Agni Saroj Mukesh Kumar |
| author_sort | Shalini Kumari |
| collection | DOAJ |
| description | Abstract The hepatitis-c virus (HCV) is a viral infection that targets the liver and has emerged as a significant global health concern. This study investigates the classification of HCV patients by identifying the potential factors crucial for the progression and early detection of HCV. The study includes dataset of 615 HCV patients from the UCI Machine Learning Repository for illustrative purposes and analyzed it through machine learning models such as naive Bayes (NB), random forest (RF), support vector machine (SVM), logistic regression (LR), decision trees (DT), and artificial neural network (ANN). The models were evaluated using various performance metrics, and a comparative analysis using non-parametric tests was conducted to evaluate the statistical significance of the model. The empirical findings show that the RF model achieved the highest performance, with an accuracy of 96.71% with Brier score (BS) of 0.035 and Matthews correlation coefficient (MCC) of 0.849, an accuracy of 96.45% with BS of 0.031 and MCC of 0.837 and an accuracy 97.41% with BS of 0.026 and MCC of 0.947 when evaluated using all features, using selected features, and selected features with the application of the synthetic minority oversampling technique (SMOTE). The analytical methods have improved the overall predictive accuracy for HCV infection and will aid in the early identification of the disease. As a result, patients can be treated at the earliest possible stage, thereby increasing the number of lives saved. |
| format | Article |
| id | doaj-art-ea01b2a39a11413fbb986c8c11a39e09 |
| institution | OA Journals |
| issn | 3005-0774 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Public Health |
| spelling | doaj-art-ea01b2a39a11413fbb986c8c11a39e092025-08-20T02:39:02ZengSpringerDiscover Public Health3005-07742025-05-0122111610.1186/s12982-025-00654-yPrediction of hepatitis-C virus using statistical learning modelsShalini Kumari0Subhajit Das1Prashant Kumar Sonker2Agni Saroj3Mukesh Kumar4Department of Statistics, Banaras Hindu UniversityDepartment of Mathematics and Statistics, Indian Institute of Science Education and ResearchDepartment of Community Medicine, Madhav Prasad Tripathi Medical CollegeDepartment of Community Medicine, Dr. Sonelal Patel Autonomous State Medical CollegeDepartment of Statistics, MMV, Banaras Hindu UniversityAbstract The hepatitis-c virus (HCV) is a viral infection that targets the liver and has emerged as a significant global health concern. This study investigates the classification of HCV patients by identifying the potential factors crucial for the progression and early detection of HCV. The study includes dataset of 615 HCV patients from the UCI Machine Learning Repository for illustrative purposes and analyzed it through machine learning models such as naive Bayes (NB), random forest (RF), support vector machine (SVM), logistic regression (LR), decision trees (DT), and artificial neural network (ANN). The models were evaluated using various performance metrics, and a comparative analysis using non-parametric tests was conducted to evaluate the statistical significance of the model. The empirical findings show that the RF model achieved the highest performance, with an accuracy of 96.71% with Brier score (BS) of 0.035 and Matthews correlation coefficient (MCC) of 0.849, an accuracy of 96.45% with BS of 0.031 and MCC of 0.837 and an accuracy 97.41% with BS of 0.026 and MCC of 0.947 when evaluated using all features, using selected features, and selected features with the application of the synthetic minority oversampling technique (SMOTE). The analytical methods have improved the overall predictive accuracy for HCV infection and will aid in the early identification of the disease. As a result, patients can be treated at the earliest possible stage, thereby increasing the number of lives saved.https://doi.org/10.1186/s12982-025-00654-yHCVSMOTEMachine learningLinear SVMArtificial neural networkPrediction |
| spellingShingle | Shalini Kumari Subhajit Das Prashant Kumar Sonker Agni Saroj Mukesh Kumar Prediction of hepatitis-C virus using statistical learning models Discover Public Health HCV SMOTE Machine learning Linear SVM Artificial neural network Prediction |
| title | Prediction of hepatitis-C virus using statistical learning models |
| title_full | Prediction of hepatitis-C virus using statistical learning models |
| title_fullStr | Prediction of hepatitis-C virus using statistical learning models |
| title_full_unstemmed | Prediction of hepatitis-C virus using statistical learning models |
| title_short | Prediction of hepatitis-C virus using statistical learning models |
| title_sort | prediction of hepatitis c virus using statistical learning models |
| topic | HCV SMOTE Machine learning Linear SVM Artificial neural network Prediction |
| url | https://doi.org/10.1186/s12982-025-00654-y |
| work_keys_str_mv | AT shalinikumari predictionofhepatitiscvirususingstatisticallearningmodels AT subhajitdas predictionofhepatitiscvirususingstatisticallearningmodels AT prashantkumarsonker predictionofhepatitiscvirususingstatisticallearningmodels AT agnisaroj predictionofhepatitiscvirususingstatisticallearningmodels AT mukeshkumar predictionofhepatitiscvirususingstatisticallearningmodels |