Prediction of hepatitis-C virus using statistical learning models

Abstract The hepatitis-c virus (HCV) is a viral infection that targets the liver and has emerged as a significant global health concern. This study investigates the classification of HCV patients by identifying the potential factors crucial for the progression and early detection of HCV. The study i...

Full description

Saved in:
Bibliographic Details
Main Authors: Shalini Kumari, Subhajit Das, Prashant Kumar Sonker, Agni Saroj, Mukesh Kumar
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Discover Public Health
Subjects:
Online Access:https://doi.org/10.1186/s12982-025-00654-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850105607841906688
author Shalini Kumari
Subhajit Das
Prashant Kumar Sonker
Agni Saroj
Mukesh Kumar
author_facet Shalini Kumari
Subhajit Das
Prashant Kumar Sonker
Agni Saroj
Mukesh Kumar
author_sort Shalini Kumari
collection DOAJ
description Abstract The hepatitis-c virus (HCV) is a viral infection that targets the liver and has emerged as a significant global health concern. This study investigates the classification of HCV patients by identifying the potential factors crucial for the progression and early detection of HCV. The study includes dataset of 615 HCV patients from the UCI Machine Learning Repository for illustrative purposes and analyzed it through machine learning models such as naive Bayes (NB), random forest (RF), support vector machine (SVM), logistic regression (LR), decision trees (DT), and artificial neural network (ANN). The models were evaluated using various performance metrics, and a comparative analysis using non-parametric tests was conducted to evaluate the statistical significance of the model. The empirical findings show that the RF model achieved the highest performance, with an accuracy of 96.71% with Brier score (BS) of 0.035 and Matthews correlation coefficient (MCC) of 0.849, an accuracy of 96.45% with BS of 0.031 and MCC of 0.837 and an accuracy 97.41% with BS of 0.026 and MCC of 0.947 when evaluated using all features, using selected features, and selected features with the application of the synthetic minority oversampling technique (SMOTE). The analytical methods have improved the overall predictive accuracy for HCV infection and will aid in the early identification of the disease. As a result, patients can be treated at the earliest possible stage, thereby increasing the number of lives saved.
format Article
id doaj-art-ea01b2a39a11413fbb986c8c11a39e09
institution OA Journals
issn 3005-0774
language English
publishDate 2025-05-01
publisher Springer
record_format Article
series Discover Public Health
spelling doaj-art-ea01b2a39a11413fbb986c8c11a39e092025-08-20T02:39:02ZengSpringerDiscover Public Health3005-07742025-05-0122111610.1186/s12982-025-00654-yPrediction of hepatitis-C virus using statistical learning modelsShalini Kumari0Subhajit Das1Prashant Kumar Sonker2Agni Saroj3Mukesh Kumar4Department of Statistics, Banaras Hindu UniversityDepartment of Mathematics and Statistics, Indian Institute of Science Education and ResearchDepartment of Community Medicine, Madhav Prasad Tripathi Medical CollegeDepartment of Community Medicine, Dr. Sonelal Patel Autonomous State Medical CollegeDepartment of Statistics, MMV, Banaras Hindu UniversityAbstract The hepatitis-c virus (HCV) is a viral infection that targets the liver and has emerged as a significant global health concern. This study investigates the classification of HCV patients by identifying the potential factors crucial for the progression and early detection of HCV. The study includes dataset of 615 HCV patients from the UCI Machine Learning Repository for illustrative purposes and analyzed it through machine learning models such as naive Bayes (NB), random forest (RF), support vector machine (SVM), logistic regression (LR), decision trees (DT), and artificial neural network (ANN). The models were evaluated using various performance metrics, and a comparative analysis using non-parametric tests was conducted to evaluate the statistical significance of the model. The empirical findings show that the RF model achieved the highest performance, with an accuracy of 96.71% with Brier score (BS) of 0.035 and Matthews correlation coefficient (MCC) of 0.849, an accuracy of 96.45% with BS of 0.031 and MCC of 0.837 and an accuracy 97.41% with BS of 0.026 and MCC of 0.947 when evaluated using all features, using selected features, and selected features with the application of the synthetic minority oversampling technique (SMOTE). The analytical methods have improved the overall predictive accuracy for HCV infection and will aid in the early identification of the disease. As a result, patients can be treated at the earliest possible stage, thereby increasing the number of lives saved.https://doi.org/10.1186/s12982-025-00654-yHCVSMOTEMachine learningLinear SVMArtificial neural networkPrediction
spellingShingle Shalini Kumari
Subhajit Das
Prashant Kumar Sonker
Agni Saroj
Mukesh Kumar
Prediction of hepatitis-C virus using statistical learning models
Discover Public Health
HCV
SMOTE
Machine learning
Linear SVM
Artificial neural network
Prediction
title Prediction of hepatitis-C virus using statistical learning models
title_full Prediction of hepatitis-C virus using statistical learning models
title_fullStr Prediction of hepatitis-C virus using statistical learning models
title_full_unstemmed Prediction of hepatitis-C virus using statistical learning models
title_short Prediction of hepatitis-C virus using statistical learning models
title_sort prediction of hepatitis c virus using statistical learning models
topic HCV
SMOTE
Machine learning
Linear SVM
Artificial neural network
Prediction
url https://doi.org/10.1186/s12982-025-00654-y
work_keys_str_mv AT shalinikumari predictionofhepatitiscvirususingstatisticallearningmodels
AT subhajitdas predictionofhepatitiscvirususingstatisticallearningmodels
AT prashantkumarsonker predictionofhepatitiscvirususingstatisticallearningmodels
AT agnisaroj predictionofhepatitiscvirususingstatisticallearningmodels
AT mukeshkumar predictionofhepatitiscvirususingstatisticallearningmodels