Integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniques

Abstract In prostate cancer (PCa), risk calculators have been proposed, relying on clinical parameters and magnetic resonance imaging (MRI) enable early prediction of clinically significant cancer (CsPCa). The prostate imaging–reporting and data system (PI-RADS) is combined with clinical variables p...

Full description

Saved in:
Bibliographic Details
Main Authors: Luis Mariano Esteban, Ángel Borque-Fernando, Maria Etelvina Escorihuela, Javier Esteban-Escaño, Jose María Abascal, Pol Servian, Juan Morote
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-88297-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823862342091276288
author Luis Mariano Esteban
Ángel Borque-Fernando
Maria Etelvina Escorihuela
Javier Esteban-Escaño
Jose María Abascal
Pol Servian
Juan Morote
author_facet Luis Mariano Esteban
Ángel Borque-Fernando
Maria Etelvina Escorihuela
Javier Esteban-Escaño
Jose María Abascal
Pol Servian
Juan Morote
author_sort Luis Mariano Esteban
collection DOAJ
description Abstract In prostate cancer (PCa), risk calculators have been proposed, relying on clinical parameters and magnetic resonance imaging (MRI) enable early prediction of clinically significant cancer (CsPCa). The prostate imaging–reporting and data system (PI-RADS) is combined with clinical variables predominantly based on logistic regression models. This study explores modeling using regularization techniques such as ridge regression, LASSO, elastic net, classification tree, tree ensemble models like random forest or XGBoost, and neural networks to predict CsPCa in a dataset of 4799 patients in Catalonia (Spain). An 80–20% split was employed for training and validation. We used predictor variables such as age, prostate-specific antigen (PSA), prostate volume, PSA density (PSAD), digital rectal exam (DRE) findings, family history of PCa, a previous negative biopsy, and PI-RADS categories. When considering a sensitivity of 0.9, in the validation set, the XGBoost model outperforms others with a specificity of 0.640, followed closely by random forest (0.638), neural network (0.634), and logistic regression (0.620). In terms of clinical utility, for a 10% missclassification of CsPCa, XGBoost can avoid 41.77% of unnecessary biopsies, followed closely by random forest (41.67%) and neural networks (41.46%), while logistic regression has a lower rate of 40.62%. Using SHAP values for model explainability, PI-RADS emerges as the most influential risk factor, particularly for individuals with PI-RADS 4 and 5. Additionally, a positive digital rectal examination (DRE) or family history of prostate cancer proves highly influential for certain individuals, while a previous negative biopsy serves as a protective factor for others.
format Article
id doaj-art-cca879b773b64d338ba60bec89899fce
institution Kabale University
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-cca879b773b64d338ba60bec89899fce2025-02-09T12:33:09ZengNature PortfolioScientific Reports2045-23222025-02-0115112110.1038/s41598-025-88297-6Integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniquesLuis Mariano Esteban0Ángel Borque-Fernando1Maria Etelvina Escorihuela2Javier Esteban-Escaño3Jose María Abascal4Pol Servian5Juan Morote6Department of Applied Mathematics, Escuela Universitaria Politécnica de La Almunia, Universidad de ZaragozaDepartment of Urology, Miguel Servet University HospitalDepartment of Applied Mathematics, Escuela Universitaria Politécnica de La Almunia, Universidad de ZaragozaDepartment of Electronic Engineering and Communications, Escuela Universitaria Politécnica de La Almunia, Universidad de ZaragozaDepartment of Urology, Department of Surgery, Parc de Salut Mar, Universitat Pompeu FabraDepartment of Urology, Hospital Germans Trias i PujolDepartment of Urology, Vall d’Hebron HospitalAbstract In prostate cancer (PCa), risk calculators have been proposed, relying on clinical parameters and magnetic resonance imaging (MRI) enable early prediction of clinically significant cancer (CsPCa). The prostate imaging–reporting and data system (PI-RADS) is combined with clinical variables predominantly based on logistic regression models. This study explores modeling using regularization techniques such as ridge regression, LASSO, elastic net, classification tree, tree ensemble models like random forest or XGBoost, and neural networks to predict CsPCa in a dataset of 4799 patients in Catalonia (Spain). An 80–20% split was employed for training and validation. We used predictor variables such as age, prostate-specific antigen (PSA), prostate volume, PSA density (PSAD), digital rectal exam (DRE) findings, family history of PCa, a previous negative biopsy, and PI-RADS categories. When considering a sensitivity of 0.9, in the validation set, the XGBoost model outperforms others with a specificity of 0.640, followed closely by random forest (0.638), neural network (0.634), and logistic regression (0.620). In terms of clinical utility, for a 10% missclassification of CsPCa, XGBoost can avoid 41.77% of unnecessary biopsies, followed closely by random forest (41.67%) and neural networks (41.46%), while logistic regression has a lower rate of 40.62%. Using SHAP values for model explainability, PI-RADS emerges as the most influential risk factor, particularly for individuals with PI-RADS 4 and 5. Additionally, a positive digital rectal examination (DRE) or family history of prostate cancer proves highly influential for certain individuals, while a previous negative biopsy serves as a protective factor for others.https://doi.org/10.1038/s41598-025-88297-6Clinically significant prostate cancerMachine learningClinical utilitySHAP values
spellingShingle Luis Mariano Esteban
Ángel Borque-Fernando
Maria Etelvina Escorihuela
Javier Esteban-Escaño
Jose María Abascal
Pol Servian
Juan Morote
Integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniques
Scientific Reports
Clinically significant prostate cancer
Machine learning
Clinical utility
SHAP values
title Integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniques
title_full Integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniques
title_fullStr Integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniques
title_full_unstemmed Integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniques
title_short Integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniques
title_sort integrating radiological and clinical data for clinically significant prostate cancer detection with machine learning techniques
topic Clinically significant prostate cancer
Machine learning
Clinical utility
SHAP values
url https://doi.org/10.1038/s41598-025-88297-6
work_keys_str_mv AT luismarianoesteban integratingradiologicalandclinicaldataforclinicallysignificantprostatecancerdetectionwithmachinelearningtechniques
AT angelborquefernando integratingradiologicalandclinicaldataforclinicallysignificantprostatecancerdetectionwithmachinelearningtechniques
AT mariaetelvinaescorihuela integratingradiologicalandclinicaldataforclinicallysignificantprostatecancerdetectionwithmachinelearningtechniques
AT javierestebanescano integratingradiologicalandclinicaldataforclinicallysignificantprostatecancerdetectionwithmachinelearningtechniques
AT josemariaabascal integratingradiologicalandclinicaldataforclinicallysignificantprostatecancerdetectionwithmachinelearningtechniques
AT polservian integratingradiologicalandclinicaldataforclinicallysignificantprostatecancerdetectionwithmachinelearningtechniques
AT juanmorote integratingradiologicalandclinicaldataforclinicallysignificantprostatecancerdetectionwithmachinelearningtechniques