Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types
Abstract The growing burden of cancer and recent surge in healthcare data availability call for new ways of analysing this multifactorial disease and improving patient outcomes. The aim of this study is to develop and evaluate prognostic cancer survival models across ten common cancer types based on...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Discover Oncology |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s12672-025-02523-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849725184076939264 |
|---|---|
| author | Jurgita Gammall Alvina G. Lai |
| author_facet | Jurgita Gammall Alvina G. Lai |
| author_sort | Jurgita Gammall |
| collection | DOAJ |
| description | Abstract The growing burden of cancer and recent surge in healthcare data availability call for new ways of analysing this multifactorial disease and improving patient outcomes. The aim of this study is to develop and evaluate prognostic cancer survival models across ten common cancer types based on a large patient sample. We compare the performance of different machine learning algorithms and assess the added value of genetic information in cancer prognosis. We also provide ways to improve model explainabilty which is critical for model adoption in clinical practice. This study included data from 9977 patients with bladder, breast, colorectal, endometrial, glioma, leukaemia, lung, ovarian, prostate, and renal cancers. Genetic data collected through the 100,000 Genomes Project was linked with clinical and demographic data provided by the National Cancer Registration and Analysis Service, Hospital Episode Statistics and Office for National Statistics. More than 500 prognostic features were assessed and four machine learning algorithms including Elastic Net Cox proportional hazards regression, random survival forest, gradient boosting survival and DeepSurv neural network were developed in this study. Most models achieved good performance varying from 60% in bladder cancer to 80% in glioma with the average C-index of 72% across all cancer types. Different machine learning methods achieved similar performance with DeepSurv model slightly underperforming compared to other methods. Addition of genetic data improved performance in endometrial, glioma, ovarian and prostate cancers, showing its potential importance for cancer prognosis. Patient’s age, stage, grade, referral route, waiting times, pre-existing conditions, previous hospital utilisation, tumour mutational burden and mutations in gene TP53 were among the most important features in cancer survival modelling. By offering a comprehensive set of predictive models for cancer survival, this study fills a critical gap in our understanding of cancer prognosis and provides new tools for informing cancer treatment and consequently improving patient outcomes. |
| format | Article |
| id | doaj-art-25efcdd28fdd44448a256b79d693b710 |
| institution | DOAJ |
| issn | 2730-6011 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Oncology |
| spelling | doaj-art-25efcdd28fdd44448a256b79d693b7102025-08-20T03:10:32ZengSpringerDiscover Oncology2730-60112025-05-0116112010.1007/s12672-025-02523-1Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer typesJurgita Gammall0Alvina G. Lai1Institute of Health Informatics, University College LondonInstitute of Health Informatics, University College LondonAbstract The growing burden of cancer and recent surge in healthcare data availability call for new ways of analysing this multifactorial disease and improving patient outcomes. The aim of this study is to develop and evaluate prognostic cancer survival models across ten common cancer types based on a large patient sample. We compare the performance of different machine learning algorithms and assess the added value of genetic information in cancer prognosis. We also provide ways to improve model explainabilty which is critical for model adoption in clinical practice. This study included data from 9977 patients with bladder, breast, colorectal, endometrial, glioma, leukaemia, lung, ovarian, prostate, and renal cancers. Genetic data collected through the 100,000 Genomes Project was linked with clinical and demographic data provided by the National Cancer Registration and Analysis Service, Hospital Episode Statistics and Office for National Statistics. More than 500 prognostic features were assessed and four machine learning algorithms including Elastic Net Cox proportional hazards regression, random survival forest, gradient boosting survival and DeepSurv neural network were developed in this study. Most models achieved good performance varying from 60% in bladder cancer to 80% in glioma with the average C-index of 72% across all cancer types. Different machine learning methods achieved similar performance with DeepSurv model slightly underperforming compared to other methods. Addition of genetic data improved performance in endometrial, glioma, ovarian and prostate cancers, showing its potential importance for cancer prognosis. Patient’s age, stage, grade, referral route, waiting times, pre-existing conditions, previous hospital utilisation, tumour mutational burden and mutations in gene TP53 were among the most important features in cancer survival modelling. By offering a comprehensive set of predictive models for cancer survival, this study fills a critical gap in our understanding of cancer prognosis and provides new tools for informing cancer treatment and consequently improving patient outcomes.https://doi.org/10.1007/s12672-025-02523-1CancerPrognosisSurvivalPredictive modelMachine learningGenetics |
| spellingShingle | Jurgita Gammall Alvina G. Lai Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types Discover Oncology Cancer Prognosis Survival Predictive model Machine learning Genetics |
| title | Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types |
| title_full | Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types |
| title_fullStr | Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types |
| title_full_unstemmed | Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types |
| title_short | Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types |
| title_sort | pan cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types |
| topic | Cancer Prognosis Survival Predictive model Machine learning Genetics |
| url | https://doi.org/10.1007/s12672-025-02523-1 |
| work_keys_str_mv | AT jurgitagammall pancancerpredictivesurvivalmodeldevelopmentandevaluationusingelectronichealthrecordandgeneticdataacross10cancertypes AT alvinaglai pancancerpredictivesurvivalmodeldevelopmentandevaluationusingelectronichealthrecordandgeneticdataacross10cancertypes |