Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression

Abstract High-sensitivity C-reactive protein (hs-CRP) is a biomarker of inflammation predicting the incidence of different health pathologies. In this study, we aimed to evaluate the association between hematological and demographic factors with hs-CRP levels using decision tree (DT) and linear regr...

Full description

Saved in:
Bibliographic Details
Main Authors: Somayeh Ghiasi Hafezi, Toktam Sahranavard, Alireza Kooshki, Marzieh Hosseini, Amin Mansoori, Elham Amir Fakhrian, Helia Rezaeifard, Mark Ghamsary, Habibollah Esmaily, Majid Ghayour-Mobarhan
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-81714-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585681485955072
author Somayeh Ghiasi Hafezi
Toktam Sahranavard
Alireza Kooshki
Marzieh Hosseini
Amin Mansoori
Elham Amir Fakhrian
Helia Rezaeifard
Mark Ghamsary
Habibollah Esmaily
Majid Ghayour-Mobarhan
author_facet Somayeh Ghiasi Hafezi
Toktam Sahranavard
Alireza Kooshki
Marzieh Hosseini
Amin Mansoori
Elham Amir Fakhrian
Helia Rezaeifard
Mark Ghamsary
Habibollah Esmaily
Majid Ghayour-Mobarhan
author_sort Somayeh Ghiasi Hafezi
collection DOAJ
description Abstract High-sensitivity C-reactive protein (hs-CRP) is a biomarker of inflammation predicting the incidence of different health pathologies. In this study, we aimed to evaluate the association between hematological and demographic factors with hs-CRP levels using decision tree (DT) and linear regression (LR) modeling. This study was conducted on a population of 9704 males and females aged 35 to 65 years recruited from the Mashhad Stroke and Heart Atherosclerotic Disorder (MASHAD) cohort study. We utilized a data mining approach to construct a predictive model of hs-CRP measurements, employing the DT methodology. DT model was used to predict hs-CRP level using biochemical factors and clinical features. A total of 9,704 individuals were included in the analysis, with 57% of them being female. Except for fasting blood glucose (FBG), hypertension (HTN), and Type 2 diabetes mellites (T2DM), all variables showed significant differences between the two groups. The results of the LR models showed that variables such as anxiety score, depression score, Systolic Blood Pressure, Cardiovascular disease, and HTN were significant in predicting hs-CRP levels. In the DT models, depression score, FBG, cholesterol, and anxiety score were identified as the most important factors in predicting hs-CRP levels. DT model was able to predict hs-CRP level with an accuracy of 72.1% in training and 71.4% in testing of both genders. The proposed DT model appears to be able to predict the hs-CRP levels based on anxiety score, depression scores, fasting blood glucose, systolic blood pressure, and history of cardiovascular diseases.
format Article
id doaj-art-02a8065279574afc919709b9ecc51dfb
institution Kabale University
issn 2045-2322
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-02a8065279574afc919709b9ecc51dfb2025-01-26T12:35:06ZengNature PortfolioScientific Reports2045-23222024-12-0114111610.1038/s41598-024-81714-2Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regressionSomayeh Ghiasi Hafezi0Toktam Sahranavard1Alireza Kooshki2Marzieh Hosseini3Amin Mansoori4Elham Amir Fakhrian5Helia Rezaeifard6Mark Ghamsary7Habibollah Esmaily8Majid Ghayour-Mobarhan9Department of Biostatistics, School of Health, Mashhad University of Medical SciencesStudent Research Committee, Faculty of Pharmacy, Mashhad University of Medical SciencesStudent Research Committee, Faculty of Pharmacy, Mashhad University of Medical SciencesDepartment of Biostatistics, College of Health, Isfahan University of Medical SciencesDepartment of Applied Mathematics, School of Mathematical Sciences, Ferdowsi University of MashhadStudent Research Committee, Faculty of Pharmacy, Mashhad University of Medical SciencesStudent Research Committee, Faculty of Pharmacy, Mashhad University of Medical SciencesSchool of Public Health, Loma Linda UniversityDepartment of Biostatistics, School of Health, Mashhad University of Medical SciencesMetabolic Syndrome Research Center, Faculty of Medicine, Mashhad University of Medical SciencesAbstract High-sensitivity C-reactive protein (hs-CRP) is a biomarker of inflammation predicting the incidence of different health pathologies. In this study, we aimed to evaluate the association between hematological and demographic factors with hs-CRP levels using decision tree (DT) and linear regression (LR) modeling. This study was conducted on a population of 9704 males and females aged 35 to 65 years recruited from the Mashhad Stroke and Heart Atherosclerotic Disorder (MASHAD) cohort study. We utilized a data mining approach to construct a predictive model of hs-CRP measurements, employing the DT methodology. DT model was used to predict hs-CRP level using biochemical factors and clinical features. A total of 9,704 individuals were included in the analysis, with 57% of them being female. Except for fasting blood glucose (FBG), hypertension (HTN), and Type 2 diabetes mellites (T2DM), all variables showed significant differences between the two groups. The results of the LR models showed that variables such as anxiety score, depression score, Systolic Blood Pressure, Cardiovascular disease, and HTN were significant in predicting hs-CRP levels. In the DT models, depression score, FBG, cholesterol, and anxiety score were identified as the most important factors in predicting hs-CRP levels. DT model was able to predict hs-CRP level with an accuracy of 72.1% in training and 71.4% in testing of both genders. The proposed DT model appears to be able to predict the hs-CRP levels based on anxiety score, depression scores, fasting blood glucose, systolic blood pressure, and history of cardiovascular diseases.https://doi.org/10.1038/s41598-024-81714-2High sensitivity C-reactive proteinHematological factorsDemographic parametersDecision tree
spellingShingle Somayeh Ghiasi Hafezi
Toktam Sahranavard
Alireza Kooshki
Marzieh Hosseini
Amin Mansoori
Elham Amir Fakhrian
Helia Rezaeifard
Mark Ghamsary
Habibollah Esmaily
Majid Ghayour-Mobarhan
Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression
Scientific Reports
High sensitivity C-reactive protein
Hematological factors
Demographic parameters
Decision tree
title Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression
title_full Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression
title_fullStr Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression
title_full_unstemmed Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression
title_short Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression
title_sort predicting high sensitivity c reactive protein levels and their associations in a large population using decision tree and linear regression
topic High sensitivity C-reactive protein
Hematological factors
Demographic parameters
Decision tree
url https://doi.org/10.1038/s41598-024-81714-2
work_keys_str_mv AT somayehghiasihafezi predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression
AT toktamsahranavard predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression
AT alirezakooshki predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression
AT marziehhosseini predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression
AT aminmansoori predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression
AT elhamamirfakhrian predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression
AT heliarezaeifard predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression
AT markghamsary predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression
AT habibollahesmaily predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression
AT majidghayourmobarhan predictinghighsensitivitycreactiveproteinlevelsandtheirassociationsinalargepopulationusingdecisiontreeandlinearregression