Machine learning-based identification of co-expressed genes in prostate cancer and CRPC and construction of prognostic models

Abstract The objective of this study was to employ machine learning to identify shared differentially expressed genes (DEGs) in prostate cancer (PCa) initiation and castration resistance, aiming to establish a robust prognostic model and enhance understanding of patient prognosis for personalized tr...

Full description

Saved in:
Bibliographic Details
Main Authors: Changhui Fan, Zhiheng Huang, Han Xu, Tianhe Zhang, Haiyang Wei, Junfeng Gao, Changbao Xu
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-90444-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850067483363377152
author Changhui Fan
Zhiheng Huang
Han Xu
Tianhe Zhang
Haiyang Wei
Junfeng Gao
Changbao Xu
Changhui Fan
author_facet Changhui Fan
Zhiheng Huang
Han Xu
Tianhe Zhang
Haiyang Wei
Junfeng Gao
Changbao Xu
Changhui Fan
author_sort Changhui Fan
collection DOAJ
description Abstract The objective of this study was to employ machine learning to identify shared differentially expressed genes (DEGs) in prostate cancer (PCa) initiation and castration resistance, aiming to establish a robust prognostic model and enhance understanding of patient prognosis for personalized treatment strategies. mRNA transcriptome data associated with Castration-Resistant Prostate Cancer (CRPC) were obtained from the GEO database. Differential expression analysis was conducted using the limma R package to compare normal prostate samples with PCa samples, and PCa samples with CRPC samples. Next, we applied LASSO regression, univariate, and multivariate COX regression analyses to pinpoint genes linked to prognosis and build prognostic models. Validation was performed using the TCGA_PRAD dataset to confirm expression differences of hub genes and explore their correlation with clinical variables and prognostic significance. We successfully established a prostate cancer risk prognostic model containing seven genes (KIF4A, UBE2C, FAM72D, CCDC78, HOXD9, LIX1 and SLC5A8) and verified its accuracy on an independent data set. The results of calibration curve and decision curve show that the model has potential clinical application value. The nomogram can accurately predict the prognosis of patients. Additionally, elevated expression of KIF4A, UBE2C, and FAM72D, or reduced expression of LIX1, correlated with advanced pathological T and N stages, clinical T stage, prostate-specific antigen (PSA) level, age at diagnosis, Gleason score, and shorter progression-free interval (PFI) (P < 0.05). By integrating bioinformatics analysis and clinical data, we not only established a reliable prognostic model for prostate cancer but also identified key genes pivotal in disease progression and treatment resistance. These findings provide novel insights and methodologies for assessing prognosis and tailoring treatment strategies for prostate cancer patients.
format Article
id doaj-art-a97ef811da804e5d8a8c8b9d44a140a7
institution DOAJ
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-a97ef811da804e5d8a8c8b9d44a140a72025-08-20T02:48:18ZengNature PortfolioScientific Reports2045-23222025-02-0115111310.1038/s41598-025-90444-yMachine learning-based identification of co-expressed genes in prostate cancer and CRPC and construction of prognostic modelsChanghui Fan0Zhiheng Huang1Han Xu2Tianhe Zhang3Haiyang Wei4Junfeng Gao5Changbao Xu6Changhui Fan7Second Affiliated Hospital of Zhengzhou UniversitySecond Affiliated Hospital of Zhengzhou UniversitySchool of Basic Medical Sciences, Zhengzhou UniversitySecond Affiliated Hospital of Zhengzhou UniversitySecond Affiliated Hospital of Zhengzhou UniversitySecond Affiliated Hospital of Zhengzhou UniversitySecond Affiliated Hospital of Zhengzhou UniversitySecond Affiliated Hospital of Zhengzhou UniversityAbstract The objective of this study was to employ machine learning to identify shared differentially expressed genes (DEGs) in prostate cancer (PCa) initiation and castration resistance, aiming to establish a robust prognostic model and enhance understanding of patient prognosis for personalized treatment strategies. mRNA transcriptome data associated with Castration-Resistant Prostate Cancer (CRPC) were obtained from the GEO database. Differential expression analysis was conducted using the limma R package to compare normal prostate samples with PCa samples, and PCa samples with CRPC samples. Next, we applied LASSO regression, univariate, and multivariate COX regression analyses to pinpoint genes linked to prognosis and build prognostic models. Validation was performed using the TCGA_PRAD dataset to confirm expression differences of hub genes and explore their correlation with clinical variables and prognostic significance. We successfully established a prostate cancer risk prognostic model containing seven genes (KIF4A, UBE2C, FAM72D, CCDC78, HOXD9, LIX1 and SLC5A8) and verified its accuracy on an independent data set. The results of calibration curve and decision curve show that the model has potential clinical application value. The nomogram can accurately predict the prognosis of patients. Additionally, elevated expression of KIF4A, UBE2C, and FAM72D, or reduced expression of LIX1, correlated with advanced pathological T and N stages, clinical T stage, prostate-specific antigen (PSA) level, age at diagnosis, Gleason score, and shorter progression-free interval (PFI) (P < 0.05). By integrating bioinformatics analysis and clinical data, we not only established a reliable prognostic model for prostate cancer but also identified key genes pivotal in disease progression and treatment resistance. These findings provide novel insights and methodologies for assessing prognosis and tailoring treatment strategies for prostate cancer patients.https://doi.org/10.1038/s41598-025-90444-yProstate cancerCastration resistanceTCGADifferentially expressed genesPrognostic modeling
spellingShingle Changhui Fan
Zhiheng Huang
Han Xu
Tianhe Zhang
Haiyang Wei
Junfeng Gao
Changbao Xu
Changhui Fan
Machine learning-based identification of co-expressed genes in prostate cancer and CRPC and construction of prognostic models
Scientific Reports
Prostate cancer
Castration resistance
TCGA
Differentially expressed genes
Prognostic modeling
title Machine learning-based identification of co-expressed genes in prostate cancer and CRPC and construction of prognostic models
title_full Machine learning-based identification of co-expressed genes in prostate cancer and CRPC and construction of prognostic models
title_fullStr Machine learning-based identification of co-expressed genes in prostate cancer and CRPC and construction of prognostic models
title_full_unstemmed Machine learning-based identification of co-expressed genes in prostate cancer and CRPC and construction of prognostic models
title_short Machine learning-based identification of co-expressed genes in prostate cancer and CRPC and construction of prognostic models
title_sort machine learning based identification of co expressed genes in prostate cancer and crpc and construction of prognostic models
topic Prostate cancer
Castration resistance
TCGA
Differentially expressed genes
Prognostic modeling
url https://doi.org/10.1038/s41598-025-90444-y
work_keys_str_mv AT changhuifan machinelearningbasedidentificationofcoexpressedgenesinprostatecancerandcrpcandconstructionofprognosticmodels
AT zhihenghuang machinelearningbasedidentificationofcoexpressedgenesinprostatecancerandcrpcandconstructionofprognosticmodels
AT hanxu machinelearningbasedidentificationofcoexpressedgenesinprostatecancerandcrpcandconstructionofprognosticmodels
AT tianhezhang machinelearningbasedidentificationofcoexpressedgenesinprostatecancerandcrpcandconstructionofprognosticmodels
AT haiyangwei machinelearningbasedidentificationofcoexpressedgenesinprostatecancerandcrpcandconstructionofprognosticmodels
AT junfenggao machinelearningbasedidentificationofcoexpressedgenesinprostatecancerandcrpcandconstructionofprognosticmodels
AT changbaoxu machinelearningbasedidentificationofcoexpressedgenesinprostatecancerandcrpcandconstructionofprognosticmodels
AT changhuifan machinelearningbasedidentificationofcoexpressedgenesinprostatecancerandcrpcandconstructionofprognosticmodels