Identification of multiomics and immune infiltration-associated biomarkers for early gastric cancer: a machine learning-based diagnostic model development study
Abstract Background Gastric cancer (GC) is a leading cause of cancer-related deaths worldwide, with early diagnosis remaining a significant challenge. Available serum biomarkers lack specificity, making it difficult to accurately identify early non-metastatic GC cases. Reliable diagnostic biomarkers...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-05-01
|
| Series: | BMC Cancer |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12885-025-14396-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Background Gastric cancer (GC) is a leading cause of cancer-related deaths worldwide, with early diagnosis remaining a significant challenge. Available serum biomarkers lack specificity, making it difficult to accurately identify early non-metastatic GC cases. Reliable diagnostic biomarkers that can detect early GC are critical to improve prognosis. Methods We employed serum proteomics combined with bioinformatics to identify genes differentially expressed in the serum of non-metastatic GC patients. Single-cell RNA sequencing (ScRNA-seq) and immune infiltration analysis were performed to evaluate the relationship between gene expression and immune cell function. Then we evaluated 107 machine learning models for biomarker-based early GC diagnosis and develops a nomogram validated for accuracy and clinical utility, subsequently comparing the performance of potential biomarkers with traditional tumor markers in diagnosing early gastric cancer. Quantitative Reverse Transcription Polymerase Chain Reaction (qRT-PCR) and immunohistochemical staining using the Human Protein Atlas (HPA) database were used to validate the differential expression of candidate genes in GC tissues and adjacent non-cancerous tissues. Results The proteomic analysis identified several genes upregulated in the serum of GC patients compared to healthy controls. Single-cell RNA sequencing analysis further revealed that these upregulated genes were associated with altered immune cell infiltration in the tumor microenvironment. The glmBoost + XGBoost model incorporating B2M, CFL1, CTSD, and HSP90AB1 demonstrated strong diagnostic performance (mean AUC = 0.792), with 101 algorithm combinations achieving an average AUC > 0.7. A nomogram integrating gene expression and clinical data was developed, validated through calibration and decision curve analyses, highlighting its potential for early GC diagnosis. Additionally, four genes—TAGLN2, HSP90AB1, SH3BGRL3, and CFL1—were found to be highly expressed in non-metastatic GC tissues and were significantly correlated with immune infiltration, including CD8 + T cells, monocytes, and myeloid-derived suppressor cells. These findings were validated by qRT-PCR and immunohistochemical analyses, confirming their elevated expression in GC tissues. Conclusions TAGLN2, HSP90AB1, SH3BGRL3 and CFL1 are potential diagnostic biomarkers for early-stage GC, with strong associations with immune cell infiltration. Machine learning model shows excellent diagnostic performance. These results provide a foundation for future studies to improve early diagnosis and individualized treatment strategies for GC. |
|---|---|
| ISSN: | 1471-2407 |