Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme

Abstract Background Glioblastoma multiforme (GBM) is a highly aggressive brain cancer with poor prognosis and limited treatment options. Despite advances in understanding its molecular mechanisms, effective therapeutic strategies remain elusive due to the tumor’s genetic complexity and heterogeneity...

Full description

Saved in:
Bibliographic Details
Main Authors: Lixin Du, Pan Wang, Xiaoting Qiu, Zhigang Li, Jianlan Ma, Pengfei Chen
Format: Article
Language:English
Published: Springer 2025-01-01
Series:Discover Oncology
Subjects:
Online Access:https://doi.org/10.1007/s12672-025-01792-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594593852424192
author Lixin Du
Pan Wang
Xiaoting Qiu
Zhigang Li
Jianlan Ma
Pengfei Chen
author_facet Lixin Du
Pan Wang
Xiaoting Qiu
Zhigang Li
Jianlan Ma
Pengfei Chen
author_sort Lixin Du
collection DOAJ
description Abstract Background Glioblastoma multiforme (GBM) is a highly aggressive brain cancer with poor prognosis and limited treatment options. Despite advances in understanding its molecular mechanisms, effective therapeutic strategies remain elusive due to the tumor’s genetic complexity and heterogeneity. Methods This study employed a comprehensive analysis approach integrating 113 machine learning algorithms with Mendelian Randomization (MR) analysis to investigate the molecular underpinnings of GBM. Five publicly available gene expression datasets were analyzed to identify differentially expressed genes (DEGs) associated with GBM. Weighted Gene Co-expression Network Analysis (WGCNA) was used to identify GBM-related gene modules. Further, gene set enrichment and variation analyses were conducted to explore the biological pathways involved. The machine learning models were evaluated using Receiver Operating Characteristic (ROC) curves and confusion matrices to assess their predictive accuracy, with the best-performing model validated across external datasets. MR analysis was performed to establish causal relationships between genetically predicted gene expression levels and GBM outcomes. Results The study identified 286 DEGs between GBM and adjacent normal tissues across five datasets. WGCNA highlighted the yellow module as the most relevant to GBM, containing key genes such as KLHL3, FOXO4, and MAP1A. Of the 113 machine learning models tested, Ridge regression achieved the highest area under the curve (AUC) of 0.92, demonstrating robust predictive accuracy. Validation using external datasets confirmed the model's reliability, with a classification accuracy of 89.5% in the training set and 85.3% in the validation sets. MR analysis provided strong evidence of a causal relationship between the expression levels of the identified genes and GBM risk. Conclusions This study demonstrates the power of combining machine learning and Mendelian Randomization to uncover novel genetic markers for GBM. The identified genes offer promising potential as biomarkers for GBM diagnosis and therapy, providing new avenues for personalized treatment strategies.
format Article
id doaj-art-01a9af22500e422e85fef20d9b30562a
institution Kabale University
issn 2730-6011
language English
publishDate 2025-01-01
publisher Springer
record_format Article
series Discover Oncology
spelling doaj-art-01a9af22500e422e85fef20d9b30562a2025-01-19T12:29:17ZengSpringerDiscover Oncology2730-60112025-01-0116111910.1007/s12672-025-01792-0Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiformeLixin Du0Pan Wang1Xiaoting Qiu2Zhigang Li3Jianlan Ma4Pengfei Chen5Department of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalAbstract Background Glioblastoma multiforme (GBM) is a highly aggressive brain cancer with poor prognosis and limited treatment options. Despite advances in understanding its molecular mechanisms, effective therapeutic strategies remain elusive due to the tumor’s genetic complexity and heterogeneity. Methods This study employed a comprehensive analysis approach integrating 113 machine learning algorithms with Mendelian Randomization (MR) analysis to investigate the molecular underpinnings of GBM. Five publicly available gene expression datasets were analyzed to identify differentially expressed genes (DEGs) associated with GBM. Weighted Gene Co-expression Network Analysis (WGCNA) was used to identify GBM-related gene modules. Further, gene set enrichment and variation analyses were conducted to explore the biological pathways involved. The machine learning models were evaluated using Receiver Operating Characteristic (ROC) curves and confusion matrices to assess their predictive accuracy, with the best-performing model validated across external datasets. MR analysis was performed to establish causal relationships between genetically predicted gene expression levels and GBM outcomes. Results The study identified 286 DEGs between GBM and adjacent normal tissues across five datasets. WGCNA highlighted the yellow module as the most relevant to GBM, containing key genes such as KLHL3, FOXO4, and MAP1A. Of the 113 machine learning models tested, Ridge regression achieved the highest area under the curve (AUC) of 0.92, demonstrating robust predictive accuracy. Validation using external datasets confirmed the model's reliability, with a classification accuracy of 89.5% in the training set and 85.3% in the validation sets. MR analysis provided strong evidence of a causal relationship between the expression levels of the identified genes and GBM risk. Conclusions This study demonstrates the power of combining machine learning and Mendelian Randomization to uncover novel genetic markers for GBM. The identified genes offer promising potential as biomarkers for GBM diagnosis and therapy, providing new avenues for personalized treatment strategies.https://doi.org/10.1007/s12672-025-01792-0Glioblastoma multiformeMachine learningMendelian randomizationGene co-expression analysis
spellingShingle Lixin Du
Pan Wang
Xiaoting Qiu
Zhigang Li
Jianlan Ma
Pengfei Chen
Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme
Discover Oncology
Glioblastoma multiforme
Machine learning
Mendelian randomization
Gene co-expression analysis
title Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme
title_full Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme
title_fullStr Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme
title_full_unstemmed Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme
title_short Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme
title_sort integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme
topic Glioblastoma multiforme
Machine learning
Mendelian randomization
Gene co-expression analysis
url https://doi.org/10.1007/s12672-025-01792-0
work_keys_str_mv AT lixindu integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme
AT panwang integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme
AT xiaotingqiu integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme
AT zhigangli integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme
AT jianlanma integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme
AT pengfeichen integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme