Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases

Abstract Cardiovascular diseases (CVDs) are complex, multifactorial conditions that require personalized assessment and treatment. Advancements in multi-omics technologies, namely RNA sequencing and whole-genome sequencing, have provided translational researchers with a comprehensive view of the hum...

Full description

Saved in:
Bibliographic Details
Main Authors: William DeGroat, Habiba Abdelhalim, Elizabeth Peker, Neev Sheth, Rishabh Narayanan, Saman Zeeshan, Bruce T. Liang, Zeeshan Ahmed
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-78553-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849434302214832128
author William DeGroat
Habiba Abdelhalim
Elizabeth Peker
Neev Sheth
Rishabh Narayanan
Saman Zeeshan
Bruce T. Liang
Zeeshan Ahmed
author_facet William DeGroat
Habiba Abdelhalim
Elizabeth Peker
Neev Sheth
Rishabh Narayanan
Saman Zeeshan
Bruce T. Liang
Zeeshan Ahmed
author_sort William DeGroat
collection DOAJ
description Abstract Cardiovascular diseases (CVDs) are complex, multifactorial conditions that require personalized assessment and treatment. Advancements in multi-omics technologies, namely RNA sequencing and whole-genome sequencing, have provided translational researchers with a comprehensive view of the human genome. The efficient synthesis and analysis of this data through integrated approach that characterizes genetic variants alongside expression patterns linked to emerging phenotypes, can reveal novel biomarkers and enable the segmentation of patient populations based on personalized risk factors. In this study, we present a cutting-edge methodology rooted in the integration of traditional bioinformatics, classical statistics, and multimodal machine learning techniques. Our approach has the potential to uncover the intricate mechanisms underlying CVD, enabling patient-specific risk and response profiling. We sourced transcriptomic expression data and single nucleotide polymorphisms (SNPs) from both CVD patients and healthy controls. By integrating these multi-omics datasets with clinical demographic information, we generated patient-specific profiles. Utilizing a robust feature selection approach, we identified a signature of 27 transcriptomic features and SNPs that are effective predictors of CVD. Differential expression analysis, combined with minimum redundancy maximum relevance feature selection, highlighted biomarkers that explain the disease phenotype. This approach prioritizes both biological relevance and efficiency in machine learning. We employed Combination Annotation Dependent Depletion scores and allele frequencies to identify variants with pathogenic characteristics in CVD patients. Classification models trained on this signature demonstrated high-accuracy predictions for CVD. The best performing of these models was an XGBoost classifier optimized via Bayesian hyperparameter tuning, which was able to correctly classify all patients in our test dataset. Using SHapley Additive exPlanations, we created risk assessments for patients, offering further contextualization of these predictions in a clinical setting. Across the cohort, RPL36AP37 and HBA1 were scored as the most important biomarkers for predicting CVDs. A comprehensive literature review revealed that a substantial portion of the diagnostic biomarkers identified have previously been associated with CVD. The framework we propose in this study is unbiased and generalizable to other diseases and disorders.
format Article
id doaj-art-b57cbcd9db094e298b7798240d408a22
institution Kabale University
issn 2045-2322
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-b57cbcd9db094e298b7798240d408a222025-08-20T03:26:43ZengNature PortfolioScientific Reports2045-23222024-11-0114111710.1038/s41598-024-78553-6Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseasesWilliam DeGroat0Habiba Abdelhalim1Elizabeth Peker2Neev Sheth3Rishabh Narayanan4Saman Zeeshan5Bruce T. Liang6Zeeshan Ahmed7Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New JerseyRutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New JerseyRutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New JerseyRutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New JerseyRutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New JerseyDepartment of Biomedical and Health Informatics, UMKC School of MedicinePat and Jim Calhoun Cardiology Center, UConn HealthRutgers Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New JerseyAbstract Cardiovascular diseases (CVDs) are complex, multifactorial conditions that require personalized assessment and treatment. Advancements in multi-omics technologies, namely RNA sequencing and whole-genome sequencing, have provided translational researchers with a comprehensive view of the human genome. The efficient synthesis and analysis of this data through integrated approach that characterizes genetic variants alongside expression patterns linked to emerging phenotypes, can reveal novel biomarkers and enable the segmentation of patient populations based on personalized risk factors. In this study, we present a cutting-edge methodology rooted in the integration of traditional bioinformatics, classical statistics, and multimodal machine learning techniques. Our approach has the potential to uncover the intricate mechanisms underlying CVD, enabling patient-specific risk and response profiling. We sourced transcriptomic expression data and single nucleotide polymorphisms (SNPs) from both CVD patients and healthy controls. By integrating these multi-omics datasets with clinical demographic information, we generated patient-specific profiles. Utilizing a robust feature selection approach, we identified a signature of 27 transcriptomic features and SNPs that are effective predictors of CVD. Differential expression analysis, combined with minimum redundancy maximum relevance feature selection, highlighted biomarkers that explain the disease phenotype. This approach prioritizes both biological relevance and efficiency in machine learning. We employed Combination Annotation Dependent Depletion scores and allele frequencies to identify variants with pathogenic characteristics in CVD patients. Classification models trained on this signature demonstrated high-accuracy predictions for CVD. The best performing of these models was an XGBoost classifier optimized via Bayesian hyperparameter tuning, which was able to correctly classify all patients in our test dataset. Using SHapley Additive exPlanations, we created risk assessments for patients, offering further contextualization of these predictions in a clinical setting. Across the cohort, RPL36AP37 and HBA1 were scored as the most important biomarkers for predicting CVDs. A comprehensive literature review revealed that a substantial portion of the diagnostic biomarkers identified have previously been associated with CVD. The framework we propose in this study is unbiased and generalizable to other diseases and disorders.https://doi.org/10.1038/s41598-024-78553-6Artificial IntelligenceMachine learningMulti-omicsGenomicsCardiovascular diseases
spellingShingle William DeGroat
Habiba Abdelhalim
Elizabeth Peker
Neev Sheth
Rishabh Narayanan
Saman Zeeshan
Bruce T. Liang
Zeeshan Ahmed
Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases
Scientific Reports
Artificial Intelligence
Machine learning
Multi-omics
Genomics
Cardiovascular diseases
title Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases
title_full Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases
title_fullStr Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases
title_full_unstemmed Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases
title_short Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases
title_sort multimodal ai ml for discovering novel biomarkers and predicting disease using multi omics profiles of patients with cardiovascular diseases
topic Artificial Intelligence
Machine learning
Multi-omics
Genomics
Cardiovascular diseases
url https://doi.org/10.1038/s41598-024-78553-6
work_keys_str_mv AT williamdegroat multimodalaimlfordiscoveringnovelbiomarkersandpredictingdiseaseusingmultiomicsprofilesofpatientswithcardiovasculardiseases
AT habibaabdelhalim multimodalaimlfordiscoveringnovelbiomarkersandpredictingdiseaseusingmultiomicsprofilesofpatientswithcardiovasculardiseases
AT elizabethpeker multimodalaimlfordiscoveringnovelbiomarkersandpredictingdiseaseusingmultiomicsprofilesofpatientswithcardiovasculardiseases
AT neevsheth multimodalaimlfordiscoveringnovelbiomarkersandpredictingdiseaseusingmultiomicsprofilesofpatientswithcardiovasculardiseases
AT rishabhnarayanan multimodalaimlfordiscoveringnovelbiomarkersandpredictingdiseaseusingmultiomicsprofilesofpatientswithcardiovasculardiseases
AT samanzeeshan multimodalaimlfordiscoveringnovelbiomarkersandpredictingdiseaseusingmultiomicsprofilesofpatientswithcardiovasculardiseases
AT brucetliang multimodalaimlfordiscoveringnovelbiomarkersandpredictingdiseaseusingmultiomicsprofilesofpatientswithcardiovasculardiseases
AT zeeshanahmed multimodalaimlfordiscoveringnovelbiomarkersandpredictingdiseaseusingmultiomicsprofilesofpatientswithcardiovasculardiseases