Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population
Abstract Colorectal cancer (CRC) is the 2nd leading cause of cancer death in the United States (US). Rural Appalachia suffers the highest CRC incidence and mortality rates. There are several non-clinical health-related social determinant factors (SDOH) associated with cancer mortality. This study de...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Online Access: | https://doi.org/10.1038/s41598-025-11074-y |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849766921620160512 |
|---|---|
| author | Aisha Montgomery Ravi Vadapalli Frank A. Dinenno Josh Schilling Praduman Jain Aasems Jacob David Chism Anil Shanker |
| author_facet | Aisha Montgomery Ravi Vadapalli Frank A. Dinenno Josh Schilling Praduman Jain Aasems Jacob David Chism Anil Shanker |
| author_sort | Aisha Montgomery |
| collection | DOAJ |
| description | Abstract Colorectal cancer (CRC) is the 2nd leading cause of cancer death in the United States (US). Rural Appalachia suffers the highest CRC incidence and mortality rates. There are several non-clinical health-related social determinant factors (SDOH) associated with cancer mortality. This study describes novel predictive modeling that uses demographic, clinical, and SDOH features from health records data from Appalachian community cancer centers to predict 5-year CRC survival. We trained, validated, and tested four gradient-boosted tree ensemble (XGBoost) machine learning models which were developed using selected combinations of available features. The area under the receiver operating characteristic curve was greatest in the model that included SDOH features with demographic and clinical features (0.79; P < 0.0001). Feature stratification showed rurality as the top SDOH feature. It is demonstrated that the ML model performs better when SDOH features are included, and that rurality significantly impacts CRC survival in Appalachia. The study provides preliminary indications that further data collection and evaluation of SDOH factors would strengthen our understanding of their impact on cancer survival in Appalachia and other underserved populations and improve development of strategies for care delivery. |
| format | Article |
| id | doaj-art-dfebb62b5aea45838dc0cc804bb9ab1e |
| institution | DOAJ |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-dfebb62b5aea45838dc0cc804bb9ab1e2025-08-20T03:04:25ZengNature PortfolioScientific Reports2045-23222025-07-0115111210.1038/s41598-025-11074-yMachine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian populationAisha Montgomery0Ravi Vadapalli1Frank A. Dinenno2Josh Schilling3Praduman Jain4Aasems Jacob5David Chism6Anil Shanker7Vibrent HealthFrost Institute for Data Science and Computing and Electrical and Computer Engineering, University of MiamiVibrent HealthVibrent HealthVibrent HealthPikeville Medical CenterThompson Cancer Survival CenterMeharry Medical CollegeAbstract Colorectal cancer (CRC) is the 2nd leading cause of cancer death in the United States (US). Rural Appalachia suffers the highest CRC incidence and mortality rates. There are several non-clinical health-related social determinant factors (SDOH) associated with cancer mortality. This study describes novel predictive modeling that uses demographic, clinical, and SDOH features from health records data from Appalachian community cancer centers to predict 5-year CRC survival. We trained, validated, and tested four gradient-boosted tree ensemble (XGBoost) machine learning models which were developed using selected combinations of available features. The area under the receiver operating characteristic curve was greatest in the model that included SDOH features with demographic and clinical features (0.79; P < 0.0001). Feature stratification showed rurality as the top SDOH feature. It is demonstrated that the ML model performs better when SDOH features are included, and that rurality significantly impacts CRC survival in Appalachia. The study provides preliminary indications that further data collection and evaluation of SDOH factors would strengthen our understanding of their impact on cancer survival in Appalachia and other underserved populations and improve development of strategies for care delivery.https://doi.org/10.1038/s41598-025-11074-y |
| spellingShingle | Aisha Montgomery Ravi Vadapalli Frank A. Dinenno Josh Schilling Praduman Jain Aasems Jacob David Chism Anil Shanker Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population Scientific Reports |
| title | Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population |
| title_full | Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population |
| title_fullStr | Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population |
| title_full_unstemmed | Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population |
| title_short | Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population |
| title_sort | machine learning to evaluate the effects of non clinical social determinant features in predicting colorectal cancer mortality in a medically underserved appalachian population |
| url | https://doi.org/10.1038/s41598-025-11074-y |
| work_keys_str_mv | AT aishamontgomery machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation AT ravivadapalli machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation AT frankadinenno machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation AT joshschilling machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation AT pradumanjain machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation AT aasemsjacob machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation AT davidchism machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation AT anilshanker machinelearningtoevaluatetheeffectsofnonclinicalsocialdeterminantfeaturesinpredictingcolorectalcancermortalityinamedicallyunderservedappalachianpopulation |