Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
Abstract Background Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored wi...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | BMC Medical Research Methodology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12874-025-02473-w |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571611143733248 |
---|---|
author | Victoria Moglia Owen Johnson Gordon Cook Marc de Kamps Lesley Smith |
author_facet | Victoria Moglia Owen Johnson Gordon Cook Marc de Kamps Lesley Smith |
author_sort | Victoria Moglia |
collection | DOAJ |
description | Abstract Background Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed. Methods The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts “artificial intelligence”, “prediction”, “health records”, “longitudinal”, and “cancer”. Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models. Results Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26). Conclusion This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients’ trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers. |
format | Article |
id | doaj-art-bb08375ab2044fb787acea11d0cfbf7e |
institution | Kabale University |
issn | 1471-2288 |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Research Methodology |
spelling | doaj-art-bb08375ab2044fb787acea11d0cfbf7e2025-02-02T12:30:21ZengBMCBMC Medical Research Methodology1471-22882025-01-0125111710.1186/s12874-025-02473-wArtificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping reviewVictoria Moglia0Owen Johnson1Gordon Cook2Marc de Kamps3Lesley Smith4School of Computing, University of LeedsSchool of Computing, University of LeedsLeeds Institute of Clinical Trials Research, University of LeedsSchool of Computing, University of LeedsLeeds Institute of Clinical Trials Research, University of LeedsAbstract Background Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed. Methods The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts “artificial intelligence”, “prediction”, “health records”, “longitudinal”, and “cancer”. Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models. Results Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26). Conclusion This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients’ trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.https://doi.org/10.1186/s12874-025-02473-wMachine learningHealth dataLongitudinal dataCancerTime-seriesTemporal |
spellingShingle | Victoria Moglia Owen Johnson Gordon Cook Marc de Kamps Lesley Smith Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review BMC Medical Research Methodology Machine learning Health data Longitudinal data Cancer Time-series Temporal |
title | Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review |
title_full | Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review |
title_fullStr | Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review |
title_full_unstemmed | Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review |
title_short | Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review |
title_sort | artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer a scoping review |
topic | Machine learning Health data Longitudinal data Cancer Time-series Temporal |
url | https://doi.org/10.1186/s12874-025-02473-w |
work_keys_str_mv | AT victoriamoglia artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview AT owenjohnson artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview AT gordoncook artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview AT marcdekamps artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview AT lesleysmith artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview |