Integrating machine learning and genetic evidence to uncover novel gene biomarkers for colorectal cancer diagnosis

Abstract From 2020 to 2022, colorectal cancer (CRC) cases increased, making it the third most common cancer and the second leading cause of cancer-related deaths worldwide. Early detection remains a significant challenge due to the lack of reliable diagnostic biomarkers. This study aimed to develop...

Full description

Saved in:
Bibliographic Details
Main Authors: Li Zhou, Lihua Yu, Mingjing Liao, Tingting Peng, Leilei Zhang, Chengyun Han, Yuan Li, Jiwang Zhang
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Discover Oncology
Subjects:
Online Access:https://doi.org/10.1007/s12672-025-02435-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849729073702502400
author Li Zhou
Lihua Yu
Mingjing Liao
Tingting Peng
Leilei Zhang
Chengyun Han
Yuan Li
Jiwang Zhang
author_facet Li Zhou
Lihua Yu
Mingjing Liao
Tingting Peng
Leilei Zhang
Chengyun Han
Yuan Li
Jiwang Zhang
author_sort Li Zhou
collection DOAJ
description Abstract From 2020 to 2022, colorectal cancer (CRC) cases increased, making it the third most common cancer and the second leading cause of cancer-related deaths worldwide. Early detection remains a significant challenge due to the lack of reliable diagnostic biomarkers. This study aimed to develop a robust gene diagnostic model for CRC using publicly available databases, such as GEO and GEPIA2. The approach integrated differential expression analysis, weighted gene co-expression network analysis (WGCNA), and the application of 113 machine learning combinations derived from 12 algorithms. The most effective model was then validated using independent datasets, which included analyses such as Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), protein–protein interaction (PPI) networks, and receiver operating characteristic (ROC) curves, along with assessments of immune infiltration and tumor-node-metastasis (TNM) staging. Notably, the glmBoost + RF algorithm identified an eight-gene diagnostic model with high precision, pinpointing key genes such as CLDN1, IFITM1, and FOXQ1, which exhibited strong diagnostic performance (AUC > 0.9). Furthermore, Mendelian randomization (MR) analysis suggested that IFITM1 may be a potential causal gene for CRC, with significant associations to immune cell profiles and established roles in immune regulation and tumor progression. Collectively, these findings highlight IFITM1, SCGN, and FOXQ1 as promising early diagnostic biomarkers and therapeutic targets for CRC, laying a foundation for future research focused on enhancing early detection and intervention strategies in colorectal cancer management.
format Article
id doaj-art-eee50dc1bdd24b74ba884f7dd33544dd
institution DOAJ
issn 2730-6011
language English
publishDate 2025-05-01
publisher Springer
record_format Article
series Discover Oncology
spelling doaj-art-eee50dc1bdd24b74ba884f7dd33544dd2025-08-20T03:09:19ZengSpringerDiscover Oncology2730-60112025-05-0116112010.1007/s12672-025-02435-0Integrating machine learning and genetic evidence to uncover novel gene biomarkers for colorectal cancer diagnosisLi Zhou0Lihua Yu1Mingjing Liao2Tingting Peng3Leilei Zhang4Chengyun Han5Yuan Li6Jiwang Zhang7Central Sterile Supply Department, The Affiliated Yongchuan Hospital of Chongqing Medical UniversityDepartment of Clinical Laboratory, The Affiliated Yongchuan Hospital of Chongqing Medical UniversityDepartment of Clinical Laboratory, The Affiliated Yongchuan Hospital of Chongqing Medical UniversityDepartment of Clinical Laboratory, The Affiliated Yongchuan Hospital of Chongqing Medical UniversityDepartment of Clinical Laboratory, The Affiliated Yongchuan Hospital of Chongqing Medical UniversityDepartment of Clinical Laboratory, The Affiliated Yongchuan Hospital of Chongqing Medical UniversityCentral Laboratory, The Affiliated Yongchuan Hospital of Chongqing Medical UniversityDepartment of Clinical Laboratory, The Affiliated Yongchuan Hospital of Chongqing Medical UniversityAbstract From 2020 to 2022, colorectal cancer (CRC) cases increased, making it the third most common cancer and the second leading cause of cancer-related deaths worldwide. Early detection remains a significant challenge due to the lack of reliable diagnostic biomarkers. This study aimed to develop a robust gene diagnostic model for CRC using publicly available databases, such as GEO and GEPIA2. The approach integrated differential expression analysis, weighted gene co-expression network analysis (WGCNA), and the application of 113 machine learning combinations derived from 12 algorithms. The most effective model was then validated using independent datasets, which included analyses such as Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), protein–protein interaction (PPI) networks, and receiver operating characteristic (ROC) curves, along with assessments of immune infiltration and tumor-node-metastasis (TNM) staging. Notably, the glmBoost + RF algorithm identified an eight-gene diagnostic model with high precision, pinpointing key genes such as CLDN1, IFITM1, and FOXQ1, which exhibited strong diagnostic performance (AUC > 0.9). Furthermore, Mendelian randomization (MR) analysis suggested that IFITM1 may be a potential causal gene for CRC, with significant associations to immune cell profiles and established roles in immune regulation and tumor progression. Collectively, these findings highlight IFITM1, SCGN, and FOXQ1 as promising early diagnostic biomarkers and therapeutic targets for CRC, laying a foundation for future research focused on enhancing early detection and intervention strategies in colorectal cancer management.https://doi.org/10.1007/s12672-025-02435-0Colorectal cancerDiagnostic modelMachine learningIFITM1Mendelian randomization
spellingShingle Li Zhou
Lihua Yu
Mingjing Liao
Tingting Peng
Leilei Zhang
Chengyun Han
Yuan Li
Jiwang Zhang
Integrating machine learning and genetic evidence to uncover novel gene biomarkers for colorectal cancer diagnosis
Discover Oncology
Colorectal cancer
Diagnostic model
Machine learning
IFITM1
Mendelian randomization
title Integrating machine learning and genetic evidence to uncover novel gene biomarkers for colorectal cancer diagnosis
title_full Integrating machine learning and genetic evidence to uncover novel gene biomarkers for colorectal cancer diagnosis
title_fullStr Integrating machine learning and genetic evidence to uncover novel gene biomarkers for colorectal cancer diagnosis
title_full_unstemmed Integrating machine learning and genetic evidence to uncover novel gene biomarkers for colorectal cancer diagnosis
title_short Integrating machine learning and genetic evidence to uncover novel gene biomarkers for colorectal cancer diagnosis
title_sort integrating machine learning and genetic evidence to uncover novel gene biomarkers for colorectal cancer diagnosis
topic Colorectal cancer
Diagnostic model
Machine learning
IFITM1
Mendelian randomization
url https://doi.org/10.1007/s12672-025-02435-0
work_keys_str_mv AT lizhou integratingmachinelearningandgeneticevidencetouncovernovelgenebiomarkersforcolorectalcancerdiagnosis
AT lihuayu integratingmachinelearningandgeneticevidencetouncovernovelgenebiomarkersforcolorectalcancerdiagnosis
AT mingjingliao integratingmachinelearningandgeneticevidencetouncovernovelgenebiomarkersforcolorectalcancerdiagnosis
AT tingtingpeng integratingmachinelearningandgeneticevidencetouncovernovelgenebiomarkersforcolorectalcancerdiagnosis
AT leileizhang integratingmachinelearningandgeneticevidencetouncovernovelgenebiomarkersforcolorectalcancerdiagnosis
AT chengyunhan integratingmachinelearningandgeneticevidencetouncovernovelgenebiomarkersforcolorectalcancerdiagnosis
AT yuanli integratingmachinelearningandgeneticevidencetouncovernovelgenebiomarkersforcolorectalcancerdiagnosis
AT jiwangzhang integratingmachinelearningandgeneticevidencetouncovernovelgenebiomarkersforcolorectalcancerdiagnosis