Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model
<b>Background:</b> Colorectal cancer (CRC) remains a leading cause of cancer-related mortality worldwide. While screening tools such as the fecal immunochemical test (FIT) aid in early detection, they do not provide insights into individual risk factors or strategies for primary preventi...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Gastrointestinal Disorders |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2624-5647/7/2/26 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849431752709242880 |
|---|---|
| author | Deborah Jael Herrera Daiane Maria Seibert Karen Feyen Marlon van Loo Guido Van Hal Wessel van de Veerdonk |
| author_facet | Deborah Jael Herrera Daiane Maria Seibert Karen Feyen Marlon van Loo Guido Van Hal Wessel van de Veerdonk |
| author_sort | Deborah Jael Herrera |
| collection | DOAJ |
| description | <b>Background:</b> Colorectal cancer (CRC) remains a leading cause of cancer-related mortality worldwide. While screening tools such as the fecal immunochemical test (FIT) aid in early detection, they do not provide insights into individual risk factors or strategies for primary prevention. This study aimed to develop and internally validate an interpretable machine learning-based model that estimates an individual’s probability of developing CRC using readily available clinical and lifestyle factors. <b>Methods:</b> We analyzed data from 154,887 adults, aged 55–74 years, who participated in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. A risk prediction model was built using the Light Gradient Boosting Machine (LightGBM) algorithm. To translate these findings into clinical practice, we implemented the model into a risk estimator that categorizes individuals as average, increased, or high risk, highlighting modifiable risk factors to support patient–clinician discussions on lifestyle changes. <b>Results:</b> The LightGBM model incorporated 12 predictive variables, with age, weight, and smoking history identified as the strongest CRC risk factors, while heart medication use appeared to have a potentially protective effect. The model achieved an area under the receiver operating characteristic curve (AUROC) of 0.726 (95% confidence interval [CI]: 0.698–0.753), correctly distinguishing high-risk from average-risk individuals 73 out of 100 times. <b>Conclusions:</b> Our findings suggest that this model could support clinicians and individuals considering screening by guiding informed decision making and facilitating patient–clinician discussions on CRC prevention through personalized lifestyle modifications. However, before clinical implementation, external validation is needed to ensure its reliability across diverse populations and confirm its effectiveness in real-world healthcare settings. |
| format | Article |
| id | doaj-art-37d6e35fa100449796fea7baf5a9ef72 |
| institution | Kabale University |
| issn | 2624-5647 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Gastrointestinal Disorders |
| spelling | doaj-art-37d6e35fa100449796fea7baf5a9ef722025-08-20T03:27:32ZengMDPI AGGastrointestinal Disorders2624-56472025-03-01722610.3390/gidisord7020026Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction ModelDeborah Jael Herrera0Daiane Maria Seibert1Karen Feyen2Marlon van Loo3Guido Van Hal4Wessel van de Veerdonk5Family Medicine and Population Health Department (FAMPOP), Faculty of Medicine and Health Sciences, University of Antwerp, 2610 Antwerp, BelgiumCentre of Expertise—Design and Technology, Campus De Nayer, Thomas More University of Applied Sciences, 2860 Sint-Katelijne-Waver, BelgiumCentre of Expertise—Design and Technology, Campus De Nayer, Thomas More University of Applied Sciences, 2860 Sint-Katelijne-Waver, BelgiumCentre of Expertise—Care and Well-Being, Campus Zandpoortvest, Thomas More University of Applied Sciences, 2800 Mechelen, BelgiumFamily Medicine and Population Health Department (FAMPOP), Faculty of Medicine and Health Sciences, University of Antwerp, 2610 Antwerp, BelgiumFamily Medicine and Population Health Department (FAMPOP), Faculty of Medicine and Health Sciences, University of Antwerp, 2610 Antwerp, Belgium<b>Background:</b> Colorectal cancer (CRC) remains a leading cause of cancer-related mortality worldwide. While screening tools such as the fecal immunochemical test (FIT) aid in early detection, they do not provide insights into individual risk factors or strategies for primary prevention. This study aimed to develop and internally validate an interpretable machine learning-based model that estimates an individual’s probability of developing CRC using readily available clinical and lifestyle factors. <b>Methods:</b> We analyzed data from 154,887 adults, aged 55–74 years, who participated in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. A risk prediction model was built using the Light Gradient Boosting Machine (LightGBM) algorithm. To translate these findings into clinical practice, we implemented the model into a risk estimator that categorizes individuals as average, increased, or high risk, highlighting modifiable risk factors to support patient–clinician discussions on lifestyle changes. <b>Results:</b> The LightGBM model incorporated 12 predictive variables, with age, weight, and smoking history identified as the strongest CRC risk factors, while heart medication use appeared to have a potentially protective effect. The model achieved an area under the receiver operating characteristic curve (AUROC) of 0.726 (95% confidence interval [CI]: 0.698–0.753), correctly distinguishing high-risk from average-risk individuals 73 out of 100 times. <b>Conclusions:</b> Our findings suggest that this model could support clinicians and individuals considering screening by guiding informed decision making and facilitating patient–clinician discussions on CRC prevention through personalized lifestyle modifications. However, before clinical implementation, external validation is needed to ensure its reliability across diverse populations and confirm its effectiveness in real-world healthcare settings.https://www.mdpi.com/2624-5647/7/2/26prediction modelcolorectal cancerscreeningmachine learningrisk threshold |
| spellingShingle | Deborah Jael Herrera Daiane Maria Seibert Karen Feyen Marlon van Loo Guido Van Hal Wessel van de Veerdonk Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model Gastrointestinal Disorders prediction model colorectal cancer screening machine learning risk threshold |
| title | Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model |
| title_full | Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model |
| title_fullStr | Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model |
| title_full_unstemmed | Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model |
| title_short | Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model |
| title_sort | development and internal validation of a machine learning based colorectal cancer risk prediction model |
| topic | prediction model colorectal cancer screening machine learning risk threshold |
| url | https://www.mdpi.com/2624-5647/7/2/26 |
| work_keys_str_mv | AT deborahjaelherrera developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel AT daianemariaseibert developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel AT karenfeyen developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel AT marlonvanloo developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel AT guidovanhal developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel AT wesselvandeveerdonk developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel |