Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective study

Objective To develop and validate a prediction model of Philadelphia chromosome-positive acute lymphoblastic leukaemia (Ph+ALL).Design A single-centre retrospective study.Participants This study analysed 471 newly diagnosed patients with ALL at the Second Affiliated Hospital of Army Medical Universi...

Full description

Saved in:
Bibliographic Details
Main Authors: Jing Zhang, Cheng Zhang, Xi Zhang, Wuchen Yang, Jingya Liu, Yang Gou, Xingqin Huang, Maoshan Chen, Dezhi Huang, Shengwang Wu, Shuiqing Liu, Xiangui Peng
Format: Article
Language:English
Published: BMJ Publishing Group 2025-06-01
Series:BMJ Open
Online Access:https://bmjopen.bmj.com/content/15/6/e097526.full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850120543375720448
author Jing Zhang
Cheng Zhang
Xi Zhang
Wuchen Yang
Jingya Liu
Yang Gou
Xingqin Huang
Maoshan Chen
Dezhi Huang
Shengwang Wu
Shuiqing Liu
Xiangui Peng
author_facet Jing Zhang
Cheng Zhang
Xi Zhang
Wuchen Yang
Jingya Liu
Yang Gou
Xingqin Huang
Maoshan Chen
Dezhi Huang
Shengwang Wu
Shuiqing Liu
Xiangui Peng
author_sort Jing Zhang
collection DOAJ
description Objective To develop and validate a prediction model of Philadelphia chromosome-positive acute lymphoblastic leukaemia (Ph+ALL).Design A single-centre retrospective study.Participants This study analysed 471 newly diagnosed patients with ALL at the Second Affiliated Hospital of Army Medical University from January 2014 to December 2023.Methods Clinical and laboratory parameters were collected, and the important characteristic parameters were selected using BorutaShap. Multiple machine learning (ML) models were constructed and optimised by using the active learning (AL) algorithm. Performance was evaluated using the area under the curve (AUC), comprehensive indicators and decision curve analysis. The interpretability of the model was evaluated by using SHapley Additive Interpretation (SHAP), and external validation was conducted on an independent test cohort.Results 10 parameters were selected to construct multiple ML models. The CatBoost model integrated with an AL algorithm (CatBoost-AL) was found to be the most effective model for predicting Ph+ALL within the validation data set. This model achieved an AUC of 0.797 (95% CI 0.710 to 0.884), along with sensitivity, specificity and F1 score of 0.667, 0.864 and 0.777, respectively. The prediction performance of CatBoost-AL was further validated with an external testing set, where it maintained a strong AUC of 0.794 (95% CI 0.707 to 0.881). Using SHAP for global interpretability analysis, age, monocyte count, γ-glutamyl transferase, neutrophil count and alanine aminotransferase were identified as crucial parameters that significantly influence the diagnostic accuracy of CatBoost-AL.Conclusion An interpretable ML model and online prediction tool were developed to determine whether newly diagnosed patients with ALL are Ph+ALL. The key parameters identified by the optimal model provided a further understanding of Ph+ALL characteristics and were valuable for accurate diagnosis and treatment of Ph+ALL.
format Article
id doaj-art-a017bc7e0caf4f3f8da29d899ee315b1
institution OA Journals
issn 2044-6055
language English
publishDate 2025-06-01
publisher BMJ Publishing Group
record_format Article
series BMJ Open
spelling doaj-art-a017bc7e0caf4f3f8da29d899ee315b12025-08-20T02:35:19ZengBMJ Publishing GroupBMJ Open2044-60552025-06-0115610.1136/bmjopen-2024-097526Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective studyJing Zhang0Cheng Zhang1Xi Zhang2Wuchen Yang3Jingya Liu4Yang Gou5Xingqin Huang6Maoshan Chen7Dezhi Huang8Shengwang Wu9Shuiqing Liu10Xiangui Peng11State Key Laboratory of Trauma and Chemical Poisoning, Chongqing, ChinaMedical Center of Hematology, The Second Affiliated Hospital of Army Medical University, Chongqing, ChinaMedical Center of Hematology, The Second Affiliated Hospital of Army Medical University, Chongqing, ChinaMedical Center of Hematology, The Second Affiliated Hospital of Army Medical University, Chongqing, ChinaMedical Center of Hematology, The Second Affiliated Hospital of Army Medical University, Chongqing, ChinaChongqing Key Laboratory of Hematology and Microenvironment, Chongqing, ChinaDepartment of Hematology, Third Military Medical University Southwest Hospital, Chongqing, ChinaLaboratory of Radiation Biology, Department of Blood Transfusion, Laboratory Medicine Center, Third Military Medical University Second Affiliated Hospital, Chongqing, ChinaMedical Center of Hematology, The Second Affiliated Hospital of Army Medical University, Chongqing, ChinaChongqing Key Laboratory of Hematology and Microenvironment, Chongqing, ChinaMedical Center of Hematology, The Second Affiliated Hospital of Army Medical University, Chongqing, ChinaMedical Center of Hematology, The Second Affiliated Hospital of Army Medical University, Chongqing, ChinaObjective To develop and validate a prediction model of Philadelphia chromosome-positive acute lymphoblastic leukaemia (Ph+ALL).Design A single-centre retrospective study.Participants This study analysed 471 newly diagnosed patients with ALL at the Second Affiliated Hospital of Army Medical University from January 2014 to December 2023.Methods Clinical and laboratory parameters were collected, and the important characteristic parameters were selected using BorutaShap. Multiple machine learning (ML) models were constructed and optimised by using the active learning (AL) algorithm. Performance was evaluated using the area under the curve (AUC), comprehensive indicators and decision curve analysis. The interpretability of the model was evaluated by using SHapley Additive Interpretation (SHAP), and external validation was conducted on an independent test cohort.Results 10 parameters were selected to construct multiple ML models. The CatBoost model integrated with an AL algorithm (CatBoost-AL) was found to be the most effective model for predicting Ph+ALL within the validation data set. This model achieved an AUC of 0.797 (95% CI 0.710 to 0.884), along with sensitivity, specificity and F1 score of 0.667, 0.864 and 0.777, respectively. The prediction performance of CatBoost-AL was further validated with an external testing set, where it maintained a strong AUC of 0.794 (95% CI 0.707 to 0.881). Using SHAP for global interpretability analysis, age, monocyte count, γ-glutamyl transferase, neutrophil count and alanine aminotransferase were identified as crucial parameters that significantly influence the diagnostic accuracy of CatBoost-AL.Conclusion An interpretable ML model and online prediction tool were developed to determine whether newly diagnosed patients with ALL are Ph+ALL. The key parameters identified by the optimal model provided a further understanding of Ph+ALL characteristics and were valuable for accurate diagnosis and treatment of Ph+ALL.https://bmjopen.bmj.com/content/15/6/e097526.full
spellingShingle Jing Zhang
Cheng Zhang
Xi Zhang
Wuchen Yang
Jingya Liu
Yang Gou
Xingqin Huang
Maoshan Chen
Dezhi Huang
Shengwang Wu
Shuiqing Liu
Xiangui Peng
Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective study
BMJ Open
title Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective study
title_full Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective study
title_fullStr Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective study
title_full_unstemmed Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective study
title_short Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective study
title_sort development and validation of an interpretable machine learning model for predicting philadelphia chromosome positive acute lymphoblastic leukaemia using clinical and laboratory parameters a single centre retrospective study
url https://bmjopen.bmj.com/content/15/6/e097526.full
work_keys_str_mv AT jingzhang developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT chengzhang developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT xizhang developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT wuchenyang developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT jingyaliu developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT yanggou developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT xingqinhuang developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT maoshanchen developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT dezhihuang developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT shengwangwu developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT shuiqingliu developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy
AT xianguipeng developmentandvalidationofaninterpretablemachinelearningmodelforpredictingphiladelphiachromosomepositiveacutelymphoblasticleukaemiausingclinicalandlaboratoryparametersasinglecentreretrospectivestudy