Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population

Abstract Colorectal cancer (CRC) is the 2nd leading cause of cancer death in the United States (US). Rural Appalachia suffers the highest CRC incidence and mortality rates. There are several non-clinical health-related social determinant factors (SDOH) associated with cancer mortality. This study de...

Full description

Saved in:
Bibliographic Details
Main Authors: Aisha Montgomery, Ravi Vadapalli, Frank A. Dinenno, Josh Schilling, Praduman Jain, Aasems Jacob, David Chism, Anil Shanker
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-11074-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849766921620160512
author Aisha Montgomery
Ravi Vadapalli
Frank A. Dinenno
Josh Schilling
Praduman Jain
Aasems Jacob
David Chism
Anil Shanker
author_facet Aisha Montgomery
Ravi Vadapalli
Frank A. Dinenno
Josh Schilling
Praduman Jain
Aasems Jacob
David Chism
Anil Shanker
author_sort Aisha Montgomery
collection DOAJ
description Abstract Colorectal cancer (CRC) is the 2nd leading cause of cancer death in the United States (US). Rural Appalachia suffers the highest CRC incidence and mortality rates. There are several non-clinical health-related social determinant factors (SDOH) associated with cancer mortality. This study describes novel predictive modeling that uses demographic, clinical, and SDOH features from health records data from Appalachian community cancer centers to predict 5-year CRC survival. We trained, validated, and tested four gradient-boosted tree ensemble (XGBoost) machine learning models which were developed using selected combinations of available features. The area under the receiver operating characteristic curve was greatest in the model that included SDOH features with demographic and clinical features (0.79; P < 0.0001). Feature stratification showed rurality as the top SDOH feature. It is demonstrated that the ML model performs better when SDOH features are included, and that rurality significantly impacts CRC survival in Appalachia. The study provides preliminary indications that further data collection and evaluation of SDOH factors would strengthen our understanding of their impact on cancer survival in Appalachia and other underserved populations and improve development of strategies for care delivery.
format Article
id doaj-art-dfebb62b5aea45838dc0cc804bb9ab1e
institution DOAJ
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-dfebb62b5aea45838dc0cc804bb9ab1e2025-08-20T03:04:25ZengNature PortfolioScientific Reports2045-23222025-07-0115111210.1038/s41598-025-11074-yMachine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian populationAisha Montgomery0Ravi Vadapalli1Frank A. Dinenno2Josh Schilling3Praduman Jain4Aasems Jacob5David Chism6Anil Shanker7Vibrent HealthFrost Institute for Data Science and Computing and Electrical and Computer Engineering, University of MiamiVibrent HealthVibrent HealthVibrent HealthPikeville Medical CenterThompson Cancer Survival CenterMeharry Medical CollegeAbstract Colorectal cancer (CRC) is the 2nd leading cause of cancer death in the United States (US). Rural Appalachia suffers the highest CRC incidence and mortality rates. There are several non-clinical health-related social determinant factors (SDOH) associated with cancer mortality. This study describes novel predictive modeling that uses demographic, clinical, and SDOH features from health records data from Appalachian community cancer centers to predict 5-year CRC survival. We trained, validated, and tested four gradient-boosted tree ensemble (XGBoost) machine learning models which were developed using selected combinations of available features. The area under the receiver operating characteristic curve was greatest in the model that included SDOH features with demographic and clinical features (0.79; P < 0.0001). Feature stratification showed rurality as the top SDOH feature. It is demonstrated that the ML model performs better when SDOH features are included, and that rurality significantly impacts CRC survival in Appalachia. The study provides preliminary indications that further data collection and evaluation of SDOH factors would strengthen our understanding of their impact on cancer survival in Appalachia and other underserved populations and improve development of strategies for care delivery.https://doi.org/10.1038/s41598-025-11074-y
spellingShingle Aisha Montgomery
Ravi Vadapalli
Frank A. Dinenno
Josh Schilling
Praduman Jain
Aasems Jacob
David Chism
Anil Shanker
Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population
Scientific Reports
title Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population
title_full Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population
title_fullStr Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population
title_full_unstemmed Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population
title_short Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population
title_sort machine learning to evaluate the effects of non clinical social determinant features in predicting colorectal cancer mortality in a medically underserved appalachian population
url https://doi.org/10.1038/s41598-025-11074-y
work_keys_str_mv AT aishamontgomery machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation
AT ravivadapalli machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation
AT frankadinenno machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation
AT joshschilling machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation
AT pradumanjain machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation
AT aasemsjacob machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation
AT davidchism machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation
AT anilshanker machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation