A machine learning approach for multimodal data fusion for survival prediction in cancer patients

Abstract Technological advancements of the past decade have transformed cancer research, improving patient survival predictions through genotyping and multimodal data analysis. However, there is no comprehensive machine-learning pipeline for comparing methods to enhance these predictions. To address...

Full description

Saved in:
Bibliographic Details
Main Authors: Nikolaos Nikolaou, Domingo Salazar, Harish RaviPrakash, Miguel Gonçalves, Rob Mulla, Nikolay Burlutskiy, Natasha Markuzon, Etai Jacob
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:npj Precision Oncology
Online Access:https://doi.org/10.1038/s41698-025-00917-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850277705637953536
author Nikolaos Nikolaou
Domingo Salazar
Harish RaviPrakash
Miguel Gonçalves
Rob Mulla
Nikolay Burlutskiy
Natasha Markuzon
Etai Jacob
author_facet Nikolaos Nikolaou
Domingo Salazar
Harish RaviPrakash
Miguel Gonçalves
Rob Mulla
Nikolay Burlutskiy
Natasha Markuzon
Etai Jacob
author_sort Nikolaos Nikolaou
collection DOAJ
description Abstract Technological advancements of the past decade have transformed cancer research, improving patient survival predictions through genotyping and multimodal data analysis. However, there is no comprehensive machine-learning pipeline for comparing methods to enhance these predictions. To address this, a versatile pipeline using The Cancer Genome Atlas (TCGA) data was developed, incorporating various data modalities such as transcripts, proteins, metabolites, and clinical factors. This approach manages challenges like high dimensionality, small sample sizes, and data heterogeneity. By applying different feature extraction and fusion strategies, notably late fusion models, the effectiveness of integrating diverse data types was demonstrated. Late fusion models consistently outperformed single-modality approaches in TCGA lung, breast, and pan-cancer datasets, offering higher accuracy and robustness. This research highlights the potential of comprehensive multimodal data integration in precision oncology to improve survival predictions for cancer patients. The study provides a reusable pipeline for the research community, suggesting future work on larger cohorts.
format Article
id doaj-art-44420503b81b47b3b68c8d1f44692fca
institution OA Journals
issn 2397-768X
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series npj Precision Oncology
spelling doaj-art-44420503b81b47b3b68c8d1f44692fca2025-08-20T01:49:46ZengNature Portfolionpj Precision Oncology2397-768X2025-05-019111410.1038/s41698-025-00917-6A machine learning approach for multimodal data fusion for survival prediction in cancer patientsNikolaos Nikolaou0Domingo Salazar1Harish RaviPrakash2Miguel Gonçalves3Rob Mulla4Nikolay Burlutskiy5Natasha Markuzon6Etai Jacob7Oncology Data Science, Oncology R&D, AstraZenecaOncology Data Science, Oncology R&D, AstraZenecaOncology Data Science, Oncology R&D, AstraZenecaOncology Data Science, Oncology R&D, AstraZenecaOncology Data Science, Oncology R&D, AstraZenecaOncology Data Science, Oncology R&D, AstraZenecaOncology Data Science, Oncology R&D, AstraZenecaOncology Data Science, Oncology R&D, AstraZenecaAbstract Technological advancements of the past decade have transformed cancer research, improving patient survival predictions through genotyping and multimodal data analysis. However, there is no comprehensive machine-learning pipeline for comparing methods to enhance these predictions. To address this, a versatile pipeline using The Cancer Genome Atlas (TCGA) data was developed, incorporating various data modalities such as transcripts, proteins, metabolites, and clinical factors. This approach manages challenges like high dimensionality, small sample sizes, and data heterogeneity. By applying different feature extraction and fusion strategies, notably late fusion models, the effectiveness of integrating diverse data types was demonstrated. Late fusion models consistently outperformed single-modality approaches in TCGA lung, breast, and pan-cancer datasets, offering higher accuracy and robustness. This research highlights the potential of comprehensive multimodal data integration in precision oncology to improve survival predictions for cancer patients. The study provides a reusable pipeline for the research community, suggesting future work on larger cohorts.https://doi.org/10.1038/s41698-025-00917-6
spellingShingle Nikolaos Nikolaou
Domingo Salazar
Harish RaviPrakash
Miguel Gonçalves
Rob Mulla
Nikolay Burlutskiy
Natasha Markuzon
Etai Jacob
A machine learning approach for multimodal data fusion for survival prediction in cancer patients
npj Precision Oncology
title A machine learning approach for multimodal data fusion for survival prediction in cancer patients
title_full A machine learning approach for multimodal data fusion for survival prediction in cancer patients
title_fullStr A machine learning approach for multimodal data fusion for survival prediction in cancer patients
title_full_unstemmed A machine learning approach for multimodal data fusion for survival prediction in cancer patients
title_short A machine learning approach for multimodal data fusion for survival prediction in cancer patients
title_sort machine learning approach for multimodal data fusion for survival prediction in cancer patients
url https://doi.org/10.1038/s41698-025-00917-6
work_keys_str_mv AT nikolaosnikolaou amachinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT domingosalazar amachinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT harishraviprakash amachinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT miguelgoncalves amachinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT robmulla amachinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT nikolayburlutskiy amachinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT natashamarkuzon amachinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT etaijacob amachinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT nikolaosnikolaou machinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT domingosalazar machinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT harishraviprakash machinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT miguelgoncalves machinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT robmulla machinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT nikolayburlutskiy machinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT natashamarkuzon machinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients
AT etaijacob machinelearningapproachformultimodaldatafusionforsurvivalpredictionincancerpatients