Integrating transcriptomics and hybrid machine learning enables high-accuracy diagnostic modeling for nasopharyngeal carcinoma
Abstract Background Nasopharyngeal carcinoma (NPC) lacks biomarkers demonstrating both high specificity and sensitivity for early diagnosis. This study aimed to develop robust machine learning (ML)-driven diagnostic models and identify key biomarkers through integrated analysis of multi-cohort trans...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-06-01
|
| Series: | Discover Oncology |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s12672-025-02932-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849334593170178048 |
|---|---|
| author | Hehe Wang Junge Zhang Peng Cheng Lujie Yu Chunlin Li Yaowen Wang |
| author_facet | Hehe Wang Junge Zhang Peng Cheng Lujie Yu Chunlin Li Yaowen Wang |
| author_sort | Hehe Wang |
| collection | DOAJ |
| description | Abstract Background Nasopharyngeal carcinoma (NPC) lacks biomarkers demonstrating both high specificity and sensitivity for early diagnosis. This study aimed to develop robust machine learning (ML)-driven diagnostic models and identify key biomarkers through integrated analysis of multi-cohort transcriptomic data. Methods Seven NPC transcriptomic datasets (GSE12452, GSE40290, GSE53819, and GSE64634 were merged to form the training cohort, while GSE13597, GSE34573, and GSE61218 served as independent external validation sets) were integrated and preprocessed using ComBat for batch effect correction. Differential expression analysis identified 293 differentially expressed genes (DEGs). Twelve ML algorithms (including Stepglm, glmBoost, and RF) were systematically combined into 113 distinct models to classify NPC versus normal tissues. Top-performing models underwent external validation. Immune infiltration patterns and functional enrichment were analyzed using CIBERSORT and GSEA/GSVA, respectively. Results The Stepglm[both]-RF hybrid model demonstrated exceptional performance with AUCs of 0.999 (training set; 95% CI: 0.997–1.000), 1.000 (GSE61218/GSE34573 validation), and 0.960 (GSE13597 validation). The glmBoost-RF model showed comparable efficacy, achieving AUCs of 1.000 (training), 0.950 (GSE61218), 1.000 (GSE34573), and 0.947 (GSE13597). Single-gene analysis identified RCN1 as a promising diagnostic marker (AUC = 0.953), with elevated expression levels correlating with poor prognosis in head and neck squamous cell carcinoma (HNSCC; p < 0.05). Immune profiling revealed significant enrichment of M1 macrophages and concomitant reduction of memory B cells in NPC. Functional enrichment analysis associated RCN1 with cell cycle regulation and immune-related pathways. Conclusion This study establishes two high-performance ML models (Stepglm[both]-RF and glmBoost-RF) with low variability for NPC diagnosis and identifies RCN1 as a dual-function biomarker with diagnostic and prognostic potential. The findings provide a scalable framework for early NPC detection and novel insights into immune microenvironment dysregulation. |
| format | Article |
| id | doaj-art-2284012a5597487aa0d04ecb69571f4e |
| institution | Kabale University |
| issn | 2730-6011 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Oncology |
| spelling | doaj-art-2284012a5597487aa0d04ecb69571f4e2025-08-20T03:45:31ZengSpringerDiscover Oncology2730-60112025-06-0116111710.1007/s12672-025-02932-2Integrating transcriptomics and hybrid machine learning enables high-accuracy diagnostic modeling for nasopharyngeal carcinomaHehe Wang0Junge Zhang1Peng Cheng2Lujie Yu3Chunlin Li4Yaowen Wang5Department of Otolaryngology, Head and Neck Surgery, The First Affiliated Hospital of Ningbo UniversityDepartment of Anesthesiology, The First Affiliated Hospital of Ningbo UniversityDepartment of Otolaryngology, Head and Neck Surgery, The First Affiliated Hospital of Ningbo UniversityDepartment of Otolaryngology, Head and Neck Surgery, The First Affiliated Hospital of Ningbo UniversityDepartment of Otolaryngology, Head and Neck Surgery, The First Affiliated Hospital of Ningbo UniversityDepartment of Otolaryngology, Head and Neck Surgery, The First Affiliated Hospital of Ningbo UniversityAbstract Background Nasopharyngeal carcinoma (NPC) lacks biomarkers demonstrating both high specificity and sensitivity for early diagnosis. This study aimed to develop robust machine learning (ML)-driven diagnostic models and identify key biomarkers through integrated analysis of multi-cohort transcriptomic data. Methods Seven NPC transcriptomic datasets (GSE12452, GSE40290, GSE53819, and GSE64634 were merged to form the training cohort, while GSE13597, GSE34573, and GSE61218 served as independent external validation sets) were integrated and preprocessed using ComBat for batch effect correction. Differential expression analysis identified 293 differentially expressed genes (DEGs). Twelve ML algorithms (including Stepglm, glmBoost, and RF) were systematically combined into 113 distinct models to classify NPC versus normal tissues. Top-performing models underwent external validation. Immune infiltration patterns and functional enrichment were analyzed using CIBERSORT and GSEA/GSVA, respectively. Results The Stepglm[both]-RF hybrid model demonstrated exceptional performance with AUCs of 0.999 (training set; 95% CI: 0.997–1.000), 1.000 (GSE61218/GSE34573 validation), and 0.960 (GSE13597 validation). The glmBoost-RF model showed comparable efficacy, achieving AUCs of 1.000 (training), 0.950 (GSE61218), 1.000 (GSE34573), and 0.947 (GSE13597). Single-gene analysis identified RCN1 as a promising diagnostic marker (AUC = 0.953), with elevated expression levels correlating with poor prognosis in head and neck squamous cell carcinoma (HNSCC; p < 0.05). Immune profiling revealed significant enrichment of M1 macrophages and concomitant reduction of memory B cells in NPC. Functional enrichment analysis associated RCN1 with cell cycle regulation and immune-related pathways. Conclusion This study establishes two high-performance ML models (Stepglm[both]-RF and glmBoost-RF) with low variability for NPC diagnosis and identifies RCN1 as a dual-function biomarker with diagnostic and prognostic potential. The findings provide a scalable framework for early NPC detection and novel insights into immune microenvironment dysregulation.https://doi.org/10.1007/s12672-025-02932-2Nasopharyngeal carcinomaMachine learningDiagnostic modelBiomarker discoveryRCN1 |
| spellingShingle | Hehe Wang Junge Zhang Peng Cheng Lujie Yu Chunlin Li Yaowen Wang Integrating transcriptomics and hybrid machine learning enables high-accuracy diagnostic modeling for nasopharyngeal carcinoma Discover Oncology Nasopharyngeal carcinoma Machine learning Diagnostic model Biomarker discovery RCN1 |
| title | Integrating transcriptomics and hybrid machine learning enables high-accuracy diagnostic modeling for nasopharyngeal carcinoma |
| title_full | Integrating transcriptomics and hybrid machine learning enables high-accuracy diagnostic modeling for nasopharyngeal carcinoma |
| title_fullStr | Integrating transcriptomics and hybrid machine learning enables high-accuracy diagnostic modeling for nasopharyngeal carcinoma |
| title_full_unstemmed | Integrating transcriptomics and hybrid machine learning enables high-accuracy diagnostic modeling for nasopharyngeal carcinoma |
| title_short | Integrating transcriptomics and hybrid machine learning enables high-accuracy diagnostic modeling for nasopharyngeal carcinoma |
| title_sort | integrating transcriptomics and hybrid machine learning enables high accuracy diagnostic modeling for nasopharyngeal carcinoma |
| topic | Nasopharyngeal carcinoma Machine learning Diagnostic model Biomarker discovery RCN1 |
| url | https://doi.org/10.1007/s12672-025-02932-2 |
| work_keys_str_mv | AT hehewang integratingtranscriptomicsandhybridmachinelearningenableshighaccuracydiagnosticmodelingfornasopharyngealcarcinoma AT jungezhang integratingtranscriptomicsandhybridmachinelearningenableshighaccuracydiagnosticmodelingfornasopharyngealcarcinoma AT pengcheng integratingtranscriptomicsandhybridmachinelearningenableshighaccuracydiagnosticmodelingfornasopharyngealcarcinoma AT lujieyu integratingtranscriptomicsandhybridmachinelearningenableshighaccuracydiagnosticmodelingfornasopharyngealcarcinoma AT chunlinli integratingtranscriptomicsandhybridmachinelearningenableshighaccuracydiagnosticmodelingfornasopharyngealcarcinoma AT yaowenwang integratingtranscriptomicsandhybridmachinelearningenableshighaccuracydiagnosticmodelingfornasopharyngealcarcinoma |