Development and validation of a biomarker-based prediction model for metastasis in patients with colorectal cancer: Application of machine learning algorithms
Objective: The purpose of the current study was to develop and validate a biomarker-based prediction model for metastasis in patients with colorectal cancer (CRC). Methods: Two datasets, GSE68468 and GSE41568, were retrieved from the Gene Expression Omnibus (GEO) database. In the GSE68468 dataset, k...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-01-01
|
| Series: | Heliyon |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2405844024174749 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850073881743720448 |
|---|---|
| author | Erfan Ayubi Sajjad Farashi Leili Tapak Saeid Afshar |
| author_facet | Erfan Ayubi Sajjad Farashi Leili Tapak Saeid Afshar |
| author_sort | Erfan Ayubi |
| collection | DOAJ |
| description | Objective: The purpose of the current study was to develop and validate a biomarker-based prediction model for metastasis in patients with colorectal cancer (CRC). Methods: Two datasets, GSE68468 and GSE41568, were retrieved from the Gene Expression Omnibus (GEO) database. In the GSE68468 dataset, key biomarkers were identified through a screening process involving differential expression analysis, redundancy analysis, and recursive feature elimination technique. Subsequently, the prediction model was developed and internally validated using five machine learning (ML) algorithms including lasso and elastic-net regularized generalized linear model (glmnet), k-nearest neighbors (kNN), support vector machine (SVM) with Radial Basis Function Kernel, random forest (RF), and eXtreme Gradient Boosting (XGBoost). The predictive performance of the algorithm with the highest accuracy was then externally validated on the GSE41568 dataset. Results: Among 22,283 registered genes in the GSE68468 dataset, the screening process identified 16 key genes including MMP3, CCDC102B, CDH2, SCGB1A1, KRT7, CYP1B1, LAMC3, ALB, DIXDC1, VWF, MMP1, CYP4B1, NKX3-2, TMEM158, GADD45B, SERPINA1 and these genes were used to build the prediction model. On the internal validation dataset, the prediction performance of five ML algorithms was as follows; RF (accuracy = 0.97 and kappa = 0.91), XGBoost (0.93, 0.81), kNN (0.93, 0.81), glmnet (0.93, 0.82) and SVM (0.92, 0.80). Top five biomarkers were MMP3, CCDC102B, CDH2, VWF and MMP1. The RF model exhibited an accuracy of 0.97, a kappa value of 0.92, and an area under the curve (AUC) of 0.99 in the external validation dataset. Conclusion: The results of this study have identified biomarkers through ML algorithms which help to identify patients with CRC prone to metastasis. |
| format | Article |
| id | doaj-art-8a3582f01f3c442f9b3c56befb527589 |
| institution | DOAJ |
| issn | 2405-8440 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Heliyon |
| spelling | doaj-art-8a3582f01f3c442f9b3c56befb5275892025-08-20T02:46:43ZengElsevierHeliyon2405-84402025-01-01111e4144310.1016/j.heliyon.2024.e41443Development and validation of a biomarker-based prediction model for metastasis in patients with colorectal cancer: Application of machine learning algorithmsErfan Ayubi0Sajjad Farashi1Leili Tapak2Saeid Afshar3Cancer Research Center, Institute of Cancer, Avicenna Health Research Institute, Hamadan University of Medical Sciences, Hamadan, IranNeurophysiology Research Center, Institute of Neuroscience and Mental Health, Avicenna Health Research Institute, Hamadan University of Medical Sciences, Hamadan, IranModeling of Noncommunicable Diseases Research Center, Institute of Health Sciences andTechnologies, Avicenna Health Research Institute, Hamadan University of Medical Sciences, Hamadan, IranCancer Research Center, Institute of Cancer, Avicenna Health Research Institute, Hamadan University of Medical Sciences, Hamadan, Iran; Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Hamadan University of Medical Sciences, Hamadan, Iran; Corresponding author. Cancer Research Center, Institute of Cancer, Avicenna Health Research Institute, Hamadan University of Medical Sciences, Hamadan, Iran.Objective: The purpose of the current study was to develop and validate a biomarker-based prediction model for metastasis in patients with colorectal cancer (CRC). Methods: Two datasets, GSE68468 and GSE41568, were retrieved from the Gene Expression Omnibus (GEO) database. In the GSE68468 dataset, key biomarkers were identified through a screening process involving differential expression analysis, redundancy analysis, and recursive feature elimination technique. Subsequently, the prediction model was developed and internally validated using five machine learning (ML) algorithms including lasso and elastic-net regularized generalized linear model (glmnet), k-nearest neighbors (kNN), support vector machine (SVM) with Radial Basis Function Kernel, random forest (RF), and eXtreme Gradient Boosting (XGBoost). The predictive performance of the algorithm with the highest accuracy was then externally validated on the GSE41568 dataset. Results: Among 22,283 registered genes in the GSE68468 dataset, the screening process identified 16 key genes including MMP3, CCDC102B, CDH2, SCGB1A1, KRT7, CYP1B1, LAMC3, ALB, DIXDC1, VWF, MMP1, CYP4B1, NKX3-2, TMEM158, GADD45B, SERPINA1 and these genes were used to build the prediction model. On the internal validation dataset, the prediction performance of five ML algorithms was as follows; RF (accuracy = 0.97 and kappa = 0.91), XGBoost (0.93, 0.81), kNN (0.93, 0.81), glmnet (0.93, 0.82) and SVM (0.92, 0.80). Top five biomarkers were MMP3, CCDC102B, CDH2, VWF and MMP1. The RF model exhibited an accuracy of 0.97, a kappa value of 0.92, and an area under the curve (AUC) of 0.99 in the external validation dataset. Conclusion: The results of this study have identified biomarkers through ML algorithms which help to identify patients with CRC prone to metastasis.http://www.sciencedirect.com/science/article/pii/S2405844024174749Colorectal cancerMetastasisMachine learningBiomarker |
| spellingShingle | Erfan Ayubi Sajjad Farashi Leili Tapak Saeid Afshar Development and validation of a biomarker-based prediction model for metastasis in patients with colorectal cancer: Application of machine learning algorithms Heliyon Colorectal cancer Metastasis Machine learning Biomarker |
| title | Development and validation of a biomarker-based prediction model for metastasis in patients with colorectal cancer: Application of machine learning algorithms |
| title_full | Development and validation of a biomarker-based prediction model for metastasis in patients with colorectal cancer: Application of machine learning algorithms |
| title_fullStr | Development and validation of a biomarker-based prediction model for metastasis in patients with colorectal cancer: Application of machine learning algorithms |
| title_full_unstemmed | Development and validation of a biomarker-based prediction model for metastasis in patients with colorectal cancer: Application of machine learning algorithms |
| title_short | Development and validation of a biomarker-based prediction model for metastasis in patients with colorectal cancer: Application of machine learning algorithms |
| title_sort | development and validation of a biomarker based prediction model for metastasis in patients with colorectal cancer application of machine learning algorithms |
| topic | Colorectal cancer Metastasis Machine learning Biomarker |
| url | http://www.sciencedirect.com/science/article/pii/S2405844024174749 |
| work_keys_str_mv | AT erfanayubi developmentandvalidationofabiomarkerbasedpredictionmodelformetastasisinpatientswithcolorectalcancerapplicationofmachinelearningalgorithms AT sajjadfarashi developmentandvalidationofabiomarkerbasedpredictionmodelformetastasisinpatientswithcolorectalcancerapplicationofmachinelearningalgorithms AT leilitapak developmentandvalidationofabiomarkerbasedpredictionmodelformetastasisinpatientswithcolorectalcancerapplicationofmachinelearningalgorithms AT saeidafshar developmentandvalidationofabiomarkerbasedpredictionmodelformetastasisinpatientswithcolorectalcancerapplicationofmachinelearningalgorithms |