DrugBERT: a BERT-based approach integrating LDA topic embedding and efficacy-aware mechanism for predicting anti-tumor drug efficacy

Abstract Background Due to the complexity of tumor genetic heterogeneity, personalized medicine has progressively emerged as the central focus of cancer research. However, how to accurately predict the drug response of patients before receiving treatment is the critical challenge to the development...

Full description

Saved in:
Bibliographic Details
Main Authors: Weiwei Zhu, Xiaodong Jiang, Lei Zhang, Peng Zhou, Xinping Xie, Hongqiang Wang
Format: Article
Language:English
Published: BMC 2025-08-01
Series:Journal of Translational Medicine
Subjects:
Online Access:https://doi.org/10.1186/s12967-025-06795-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235114556391424
author Weiwei Zhu
Xiaodong Jiang
Lei Zhang
Peng Zhou
Xinping Xie
Hongqiang Wang
author_facet Weiwei Zhu
Xiaodong Jiang
Lei Zhang
Peng Zhou
Xinping Xie
Hongqiang Wang
author_sort Weiwei Zhu
collection DOAJ
description Abstract Background Due to the complexity of tumor genetic heterogeneity, personalized medicine has progressively emerged as the central focus of cancer research. However, how to accurately predict the drug response of patients before receiving treatment is the critical challenge to the development of this field. Methods This paper proposes DrugBERT, a BERT-based framework integrated with LDA topic embedding and a drug efficacy-aware mechanism for predicting the efficacy of antitumor drugs. The method incorporates LDA-generated topic embedding as a semantic enhancement module into the BERT language model and introduces a drug efficacy-aware attention mechanism to prioritize drug efficacy-related semantic features. The model is via LSTM to capture long-range dependencies in clinical text data. In addition, the SMOTE algorithm is used to synthesize samples of the minority class to solve the problem of data imbalance. Results The proposed method DrugBERT demonstrated remarkable performance on a dataset of 958 patients with non-small cell cancer treated with antitumor drugs. Furthermore, when validated on an independent dataset of 266 bowel cancer patients, the model achieved a 3% improvement in AUC over previous methods, signifying its robust generalization capability. Conclusions DrugBERT can help predict the efficacy of antitumor drugs based on clinical text while exhibiting strong generalization capability. These findings highlight its potential for optimizing personalized therapeutic strategies through language model.
format Article
id doaj-art-ba307aa7f2d440d2b5a711925e58863d
institution Kabale University
issn 1479-5876
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series Journal of Translational Medicine
spelling doaj-art-ba307aa7f2d440d2b5a711925e58863d2025-08-20T04:02:55ZengBMCJournal of Translational Medicine1479-58762025-08-0123111110.1186/s12967-025-06795-7DrugBERT: a BERT-based approach integrating LDA topic embedding and efficacy-aware mechanism for predicting anti-tumor drug efficacyWeiwei Zhu0Xiaodong Jiang1Lei Zhang2Peng Zhou3Xinping Xie4Hongqiang Wang5University of Science and Technology of ChinaMedical Oncology Department, The First Affiliated Hospital of University of Science and Technology of ChinaDepartment of Pharmacy, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of ChinaSchool of Life Science, Hefei Normal UniversitySchool of Mathematics and Physics, Anhui Jianzhu UniversityInstitute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of SciencesAbstract Background Due to the complexity of tumor genetic heterogeneity, personalized medicine has progressively emerged as the central focus of cancer research. However, how to accurately predict the drug response of patients before receiving treatment is the critical challenge to the development of this field. Methods This paper proposes DrugBERT, a BERT-based framework integrated with LDA topic embedding and a drug efficacy-aware mechanism for predicting the efficacy of antitumor drugs. The method incorporates LDA-generated topic embedding as a semantic enhancement module into the BERT language model and introduces a drug efficacy-aware attention mechanism to prioritize drug efficacy-related semantic features. The model is via LSTM to capture long-range dependencies in clinical text data. In addition, the SMOTE algorithm is used to synthesize samples of the minority class to solve the problem of data imbalance. Results The proposed method DrugBERT demonstrated remarkable performance on a dataset of 958 patients with non-small cell cancer treated with antitumor drugs. Furthermore, when validated on an independent dataset of 266 bowel cancer patients, the model achieved a 3% improvement in AUC over previous methods, signifying its robust generalization capability. Conclusions DrugBERT can help predict the efficacy of antitumor drugs based on clinical text while exhibiting strong generalization capability. These findings highlight its potential for optimizing personalized therapeutic strategies through language model.https://doi.org/10.1186/s12967-025-06795-7Drug efficacy predictionLDA topic embeddingBERTSelf-attention mechanismClinical text data
spellingShingle Weiwei Zhu
Xiaodong Jiang
Lei Zhang
Peng Zhou
Xinping Xie
Hongqiang Wang
DrugBERT: a BERT-based approach integrating LDA topic embedding and efficacy-aware mechanism for predicting anti-tumor drug efficacy
Journal of Translational Medicine
Drug efficacy prediction
LDA topic embedding
BERT
Self-attention mechanism
Clinical text data
title DrugBERT: a BERT-based approach integrating LDA topic embedding and efficacy-aware mechanism for predicting anti-tumor drug efficacy
title_full DrugBERT: a BERT-based approach integrating LDA topic embedding and efficacy-aware mechanism for predicting anti-tumor drug efficacy
title_fullStr DrugBERT: a BERT-based approach integrating LDA topic embedding and efficacy-aware mechanism for predicting anti-tumor drug efficacy
title_full_unstemmed DrugBERT: a BERT-based approach integrating LDA topic embedding and efficacy-aware mechanism for predicting anti-tumor drug efficacy
title_short DrugBERT: a BERT-based approach integrating LDA topic embedding and efficacy-aware mechanism for predicting anti-tumor drug efficacy
title_sort drugbert a bert based approach integrating lda topic embedding and efficacy aware mechanism for predicting anti tumor drug efficacy
topic Drug efficacy prediction
LDA topic embedding
BERT
Self-attention mechanism
Clinical text data
url https://doi.org/10.1186/s12967-025-06795-7
work_keys_str_mv AT weiweizhu drugbertabertbasedapproachintegratingldatopicembeddingandefficacyawaremechanismforpredictingantitumordrugefficacy
AT xiaodongjiang drugbertabertbasedapproachintegratingldatopicembeddingandefficacyawaremechanismforpredictingantitumordrugefficacy
AT leizhang drugbertabertbasedapproachintegratingldatopicembeddingandefficacyawaremechanismforpredictingantitumordrugefficacy
AT pengzhou drugbertabertbasedapproachintegratingldatopicembeddingandefficacyawaremechanismforpredictingantitumordrugefficacy
AT xinpingxie drugbertabertbasedapproachintegratingldatopicembeddingandefficacyawaremechanismforpredictingantitumordrugefficacy
AT hongqiangwang drugbertabertbasedapproachintegratingldatopicembeddingandefficacyawaremechanismforpredictingantitumordrugefficacy