Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model

<b>Background:</b> Colorectal cancer (CRC) remains a leading cause of cancer-related mortality worldwide. While screening tools such as the fecal immunochemical test (FIT) aid in early detection, they do not provide insights into individual risk factors or strategies for primary preventi...

Full description

Saved in:
Bibliographic Details
Main Authors: Deborah Jael Herrera, Daiane Maria Seibert, Karen Feyen, Marlon van Loo, Guido Van Hal, Wessel van de Veerdonk
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Gastrointestinal Disorders
Subjects:
Online Access:https://www.mdpi.com/2624-5647/7/2/26
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849431752709242880
author Deborah Jael Herrera
Daiane Maria Seibert
Karen Feyen
Marlon van Loo
Guido Van Hal
Wessel van de Veerdonk
author_facet Deborah Jael Herrera
Daiane Maria Seibert
Karen Feyen
Marlon van Loo
Guido Van Hal
Wessel van de Veerdonk
author_sort Deborah Jael Herrera
collection DOAJ
description <b>Background:</b> Colorectal cancer (CRC) remains a leading cause of cancer-related mortality worldwide. While screening tools such as the fecal immunochemical test (FIT) aid in early detection, they do not provide insights into individual risk factors or strategies for primary prevention. This study aimed to develop and internally validate an interpretable machine learning-based model that estimates an individual’s probability of developing CRC using readily available clinical and lifestyle factors. <b>Methods:</b> We analyzed data from 154,887 adults, aged 55–74 years, who participated in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. A risk prediction model was built using the Light Gradient Boosting Machine (LightGBM) algorithm. To translate these findings into clinical practice, we implemented the model into a risk estimator that categorizes individuals as average, increased, or high risk, highlighting modifiable risk factors to support patient–clinician discussions on lifestyle changes. <b>Results:</b> The LightGBM model incorporated 12 predictive variables, with age, weight, and smoking history identified as the strongest CRC risk factors, while heart medication use appeared to have a potentially protective effect. The model achieved an area under the receiver operating characteristic curve (AUROC) of 0.726 (95% confidence interval [CI]: 0.698–0.753), correctly distinguishing high-risk from average-risk individuals 73 out of 100 times. <b>Conclusions:</b> Our findings suggest that this model could support clinicians and individuals considering screening by guiding informed decision making and facilitating patient–clinician discussions on CRC prevention through personalized lifestyle modifications. However, before clinical implementation, external validation is needed to ensure its reliability across diverse populations and confirm its effectiveness in real-world healthcare settings.
format Article
id doaj-art-37d6e35fa100449796fea7baf5a9ef72
institution Kabale University
issn 2624-5647
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Gastrointestinal Disorders
spelling doaj-art-37d6e35fa100449796fea7baf5a9ef722025-08-20T03:27:32ZengMDPI AGGastrointestinal Disorders2624-56472025-03-01722610.3390/gidisord7020026Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction ModelDeborah Jael Herrera0Daiane Maria Seibert1Karen Feyen2Marlon van Loo3Guido Van Hal4Wessel van de Veerdonk5Family Medicine and Population Health Department (FAMPOP), Faculty of Medicine and Health Sciences, University of Antwerp, 2610 Antwerp, BelgiumCentre of Expertise—Design and Technology, Campus De Nayer, Thomas More University of Applied Sciences, 2860 Sint-Katelijne-Waver, BelgiumCentre of Expertise—Design and Technology, Campus De Nayer, Thomas More University of Applied Sciences, 2860 Sint-Katelijne-Waver, BelgiumCentre of Expertise—Care and Well-Being, Campus Zandpoortvest, Thomas More University of Applied Sciences, 2800 Mechelen, BelgiumFamily Medicine and Population Health Department (FAMPOP), Faculty of Medicine and Health Sciences, University of Antwerp, 2610 Antwerp, BelgiumFamily Medicine and Population Health Department (FAMPOP), Faculty of Medicine and Health Sciences, University of Antwerp, 2610 Antwerp, Belgium<b>Background:</b> Colorectal cancer (CRC) remains a leading cause of cancer-related mortality worldwide. While screening tools such as the fecal immunochemical test (FIT) aid in early detection, they do not provide insights into individual risk factors or strategies for primary prevention. This study aimed to develop and internally validate an interpretable machine learning-based model that estimates an individual’s probability of developing CRC using readily available clinical and lifestyle factors. <b>Methods:</b> We analyzed data from 154,887 adults, aged 55–74 years, who participated in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. A risk prediction model was built using the Light Gradient Boosting Machine (LightGBM) algorithm. To translate these findings into clinical practice, we implemented the model into a risk estimator that categorizes individuals as average, increased, or high risk, highlighting modifiable risk factors to support patient–clinician discussions on lifestyle changes. <b>Results:</b> The LightGBM model incorporated 12 predictive variables, with age, weight, and smoking history identified as the strongest CRC risk factors, while heart medication use appeared to have a potentially protective effect. The model achieved an area under the receiver operating characteristic curve (AUROC) of 0.726 (95% confidence interval [CI]: 0.698–0.753), correctly distinguishing high-risk from average-risk individuals 73 out of 100 times. <b>Conclusions:</b> Our findings suggest that this model could support clinicians and individuals considering screening by guiding informed decision making and facilitating patient–clinician discussions on CRC prevention through personalized lifestyle modifications. However, before clinical implementation, external validation is needed to ensure its reliability across diverse populations and confirm its effectiveness in real-world healthcare settings.https://www.mdpi.com/2624-5647/7/2/26prediction modelcolorectal cancerscreeningmachine learningrisk threshold
spellingShingle Deborah Jael Herrera
Daiane Maria Seibert
Karen Feyen
Marlon van Loo
Guido Van Hal
Wessel van de Veerdonk
Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model
Gastrointestinal Disorders
prediction model
colorectal cancer
screening
machine learning
risk threshold
title Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model
title_full Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model
title_fullStr Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model
title_full_unstemmed Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model
title_short Development and Internal Validation of a Machine Learning-Based Colorectal Cancer Risk Prediction Model
title_sort development and internal validation of a machine learning based colorectal cancer risk prediction model
topic prediction model
colorectal cancer
screening
machine learning
risk threshold
url https://www.mdpi.com/2624-5647/7/2/26
work_keys_str_mv AT deborahjaelherrera developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel
AT daianemariaseibert developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel
AT karenfeyen developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel
AT marlonvanloo developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel
AT guidovanhal developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel
AT wesselvandeveerdonk developmentandinternalvalidationofamachinelearningbasedcolorectalcancerriskpredictionmodel