Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme
Abstract Background Glioblastoma multiforme (GBM) is a highly aggressive brain cancer with poor prognosis and limited treatment options. Despite advances in understanding its molecular mechanisms, effective therapeutic strategies remain elusive due to the tumor’s genetic complexity and heterogeneity...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2025-01-01
|
Series: | Discover Oncology |
Subjects: | |
Online Access: | https://doi.org/10.1007/s12672-025-01792-0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832594593852424192 |
---|---|
author | Lixin Du Pan Wang Xiaoting Qiu Zhigang Li Jianlan Ma Pengfei Chen |
author_facet | Lixin Du Pan Wang Xiaoting Qiu Zhigang Li Jianlan Ma Pengfei Chen |
author_sort | Lixin Du |
collection | DOAJ |
description | Abstract Background Glioblastoma multiforme (GBM) is a highly aggressive brain cancer with poor prognosis and limited treatment options. Despite advances in understanding its molecular mechanisms, effective therapeutic strategies remain elusive due to the tumor’s genetic complexity and heterogeneity. Methods This study employed a comprehensive analysis approach integrating 113 machine learning algorithms with Mendelian Randomization (MR) analysis to investigate the molecular underpinnings of GBM. Five publicly available gene expression datasets were analyzed to identify differentially expressed genes (DEGs) associated with GBM. Weighted Gene Co-expression Network Analysis (WGCNA) was used to identify GBM-related gene modules. Further, gene set enrichment and variation analyses were conducted to explore the biological pathways involved. The machine learning models were evaluated using Receiver Operating Characteristic (ROC) curves and confusion matrices to assess their predictive accuracy, with the best-performing model validated across external datasets. MR analysis was performed to establish causal relationships between genetically predicted gene expression levels and GBM outcomes. Results The study identified 286 DEGs between GBM and adjacent normal tissues across five datasets. WGCNA highlighted the yellow module as the most relevant to GBM, containing key genes such as KLHL3, FOXO4, and MAP1A. Of the 113 machine learning models tested, Ridge regression achieved the highest area under the curve (AUC) of 0.92, demonstrating robust predictive accuracy. Validation using external datasets confirmed the model's reliability, with a classification accuracy of 89.5% in the training set and 85.3% in the validation sets. MR analysis provided strong evidence of a causal relationship between the expression levels of the identified genes and GBM risk. Conclusions This study demonstrates the power of combining machine learning and Mendelian Randomization to uncover novel genetic markers for GBM. The identified genes offer promising potential as biomarkers for GBM diagnosis and therapy, providing new avenues for personalized treatment strategies. |
format | Article |
id | doaj-art-01a9af22500e422e85fef20d9b30562a |
institution | Kabale University |
issn | 2730-6011 |
language | English |
publishDate | 2025-01-01 |
publisher | Springer |
record_format | Article |
series | Discover Oncology |
spelling | doaj-art-01a9af22500e422e85fef20d9b30562a2025-01-19T12:29:17ZengSpringerDiscover Oncology2730-60112025-01-0116111910.1007/s12672-025-01792-0Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiformeLixin Du0Pan Wang1Xiaoting Qiu2Zhigang Li3Jianlan Ma4Pengfei Chen5Department of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalDepartment of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central HospitalAbstract Background Glioblastoma multiforme (GBM) is a highly aggressive brain cancer with poor prognosis and limited treatment options. Despite advances in understanding its molecular mechanisms, effective therapeutic strategies remain elusive due to the tumor’s genetic complexity and heterogeneity. Methods This study employed a comprehensive analysis approach integrating 113 machine learning algorithms with Mendelian Randomization (MR) analysis to investigate the molecular underpinnings of GBM. Five publicly available gene expression datasets were analyzed to identify differentially expressed genes (DEGs) associated with GBM. Weighted Gene Co-expression Network Analysis (WGCNA) was used to identify GBM-related gene modules. Further, gene set enrichment and variation analyses were conducted to explore the biological pathways involved. The machine learning models were evaluated using Receiver Operating Characteristic (ROC) curves and confusion matrices to assess their predictive accuracy, with the best-performing model validated across external datasets. MR analysis was performed to establish causal relationships between genetically predicted gene expression levels and GBM outcomes. Results The study identified 286 DEGs between GBM and adjacent normal tissues across five datasets. WGCNA highlighted the yellow module as the most relevant to GBM, containing key genes such as KLHL3, FOXO4, and MAP1A. Of the 113 machine learning models tested, Ridge regression achieved the highest area under the curve (AUC) of 0.92, demonstrating robust predictive accuracy. Validation using external datasets confirmed the model's reliability, with a classification accuracy of 89.5% in the training set and 85.3% in the validation sets. MR analysis provided strong evidence of a causal relationship between the expression levels of the identified genes and GBM risk. Conclusions This study demonstrates the power of combining machine learning and Mendelian Randomization to uncover novel genetic markers for GBM. The identified genes offer promising potential as biomarkers for GBM diagnosis and therapy, providing new avenues for personalized treatment strategies.https://doi.org/10.1007/s12672-025-01792-0Glioblastoma multiformeMachine learningMendelian randomizationGene co-expression analysis |
spellingShingle | Lixin Du Pan Wang Xiaoting Qiu Zhigang Li Jianlan Ma Pengfei Chen Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme Discover Oncology Glioblastoma multiforme Machine learning Mendelian randomization Gene co-expression analysis |
title | Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme |
title_full | Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme |
title_fullStr | Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme |
title_full_unstemmed | Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme |
title_short | Integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme |
title_sort | integrating machine learning with mendelian randomization for unveiling causal gene networks in glioblastoma multiforme |
topic | Glioblastoma multiforme Machine learning Mendelian randomization Gene co-expression analysis |
url | https://doi.org/10.1007/s12672-025-01792-0 |
work_keys_str_mv | AT lixindu integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme AT panwang integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme AT xiaotingqiu integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme AT zhigangli integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme AT jianlanma integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme AT pengfeichen integratingmachinelearningwithmendelianrandomizationforunveilingcausalgenenetworksinglioblastomamultiforme |