A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering

Abstract Hepatitis C is a liver infection triggered by the hepatitis C virus (HCV). The infection results in swelling and irritation of the liver, which is called inflammation. Prolonged untreated exposure to the virus can lead to chronic hepatitis C. This can result in serious health complications...

Full description

Saved in:
Bibliographic Details
Main Authors: Aryan Sharma, Tanmay Khade, Shashank Mouli Satapathy
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-91298-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850029197243711488
author Aryan Sharma
Tanmay Khade
Shashank Mouli Satapathy
author_facet Aryan Sharma
Tanmay Khade
Shashank Mouli Satapathy
author_sort Aryan Sharma
collection DOAJ
description Abstract Hepatitis C is a liver infection triggered by the hepatitis C virus (HCV). The infection results in swelling and irritation of the liver, which is called inflammation. Prolonged untreated exposure to the virus can lead to chronic hepatitis C. This can result in serious health complications such as liver damage, hepatocellular carcinoma (HCC), and potentially death. Therefore, rapid diagnosis and prompt treatment of HCV is crucial. This study utilizes machine learning (ML) to precisely identify hepatitis C in patients by analyzing parameters obtained from a standard biochemistry test. A hybrid dataset was acquired by merging two commonly used datasets from individual sources. A portion of the dataset was used as a hold-out set to simulate real-world data. A multi-dimensional pre-clustering approach was used in this study in the form of k-means for binning and k-modes for categorical clustering. The pre-clustering approach was used to extract a new feature. This extracted feature column was added to the original dataset and was used to train a stacked meta-model. The model was compared against baseline models. The predictions were further elaborated using explainable artificial intelligence. The models used were XGBoost, K-nearest neighbor, support vector classifier, and random forest (RF). The baseline score obtained was 94.25% using RF, while the meta-model gave a score of 94.82%.
format Article
id doaj-art-ca32f1c303d34ee8a45ce7c802bceddb
institution DOAJ
issn 2045-2322
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-ca32f1c303d34ee8a45ce7c802bceddb2025-08-20T02:59:35ZengNature PortfolioScientific Reports2045-23222025-03-0115111710.1038/s41598-025-91298-0A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clusteringAryan Sharma0Tanmay Khade1Shashank Mouli Satapathy2School of Computer Science and Engineering, Vellore Institute of TechnologySchool of Computer Science and Engineering, Vellore Institute of TechnologySchool of Computer Science and Engineering, Vellore Institute of TechnologyAbstract Hepatitis C is a liver infection triggered by the hepatitis C virus (HCV). The infection results in swelling and irritation of the liver, which is called inflammation. Prolonged untreated exposure to the virus can lead to chronic hepatitis C. This can result in serious health complications such as liver damage, hepatocellular carcinoma (HCC), and potentially death. Therefore, rapid diagnosis and prompt treatment of HCV is crucial. This study utilizes machine learning (ML) to precisely identify hepatitis C in patients by analyzing parameters obtained from a standard biochemistry test. A hybrid dataset was acquired by merging two commonly used datasets from individual sources. A portion of the dataset was used as a hold-out set to simulate real-world data. A multi-dimensional pre-clustering approach was used in this study in the form of k-means for binning and k-modes for categorical clustering. The pre-clustering approach was used to extract a new feature. This extracted feature column was added to the original dataset and was used to train a stacked meta-model. The model was compared against baseline models. The predictions were further elaborated using explainable artificial intelligence. The models used were XGBoost, K-nearest neighbor, support vector classifier, and random forest (RF). The baseline score obtained was 94.25% using RF, while the meta-model gave a score of 94.82%.https://doi.org/10.1038/s41598-025-91298-0Hepatitis CClusteringK-centroid clusteringK-means clusteringK-modes clusteringMachine learning
spellingShingle Aryan Sharma
Tanmay Khade
Shashank Mouli Satapathy
A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering
Scientific Reports
Hepatitis C
Clustering
K-centroid clustering
K-means clustering
K-modes clustering
Machine learning
title A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering
title_full A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering
title_fullStr A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering
title_full_unstemmed A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering
title_short A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering
title_sort cross dataset meta model for hepatitis c detection using multi dimensional pre clustering
topic Hepatitis C
Clustering
K-centroid clustering
K-means clustering
K-modes clustering
Machine learning
url https://doi.org/10.1038/s41598-025-91298-0
work_keys_str_mv AT aryansharma acrossdatasetmetamodelforhepatitiscdetectionusingmultidimensionalpreclustering
AT tanmaykhade acrossdatasetmetamodelforhepatitiscdetectionusingmultidimensionalpreclustering
AT shashankmoulisatapathy acrossdatasetmetamodelforhepatitiscdetectionusingmultidimensionalpreclustering
AT aryansharma crossdatasetmetamodelforhepatitiscdetectionusingmultidimensionalpreclustering
AT tanmaykhade crossdatasetmetamodelforhepatitiscdetectionusingmultidimensionalpreclustering
AT shashankmoulisatapathy crossdatasetmetamodelforhepatitiscdetectionusingmultidimensionalpreclustering