BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction

Abstract Background Diabetes is a global metabolic disease that urgently calls for the development of new and effective therapeutic agents. Anti-diabetic peptides (ADPs) have emerged as a research hotspot due to their therapeutic potential and natural safety, representing a promising class of functi...

Full description

Saved in:
Bibliographic Details
Main Authors: Xueqin Xie, Changchun Wu, Yixuan Qi, Shanghua Liu, Jian Huang, Hao Lyu, Fuying Dao, Hao Lin
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Biology
Subjects:
Online Access:https://doi.org/10.1186/s12915-025-02312-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849331666319835136
author Xueqin Xie
Changchun Wu
Yixuan Qi
Shanghua Liu
Jian Huang
Hao Lyu
Fuying Dao
Hao Lin
author_facet Xueqin Xie
Changchun Wu
Yixuan Qi
Shanghua Liu
Jian Huang
Hao Lyu
Fuying Dao
Hao Lin
author_sort Xueqin Xie
collection DOAJ
description Abstract Background Diabetes is a global metabolic disease that urgently calls for the development of new and effective therapeutic agents. Anti-diabetic peptides (ADPs) have emerged as a research hotspot due to their therapeutic potential and natural safety, representing a promising class of functional peptides for diabetic management. However, conventional computational approaches for ADPs prediction mainly rely on manually extracted sequence features. These methods often lack generalizability and perform poorly on short peptides, thereby hindering effective ADPs discovery. Results In this study, we introduce a fine-tuning strategy of large-scale pre-trained protein language models (PLMs) for ADPs prediction, enabling automated extraction of discriminative semantic representations. We established the most comprehensive ADPs dataset to date, comprising 899 rigorously curated non-redundant ADPs and 67 newly collected potential candidates. Based on three model construction strategies, we developed 11 candidate models. Among them, BertADP (a fine-tuned ProtBert model) demonstrated superior performance in the independent test set, outperforming existing ADPs prediction tools with an overall accuracy of 0.955, sensitivity of 1.000, and specificity of 0.910. Notably, BertADP exhibited remarkable sequence length adaptability, maintaining stable performance across both standard and short peptide sequences. Conclusions BertADP represents the first PLMs-based intelligent prediction tool for ADPs, whose exceptional identification capability will significantly accelerate anti-diabetic drug development and facilitate personalized therapeutic strategies, thereby enhancing precision diabetes management. Furthermore, the proposed approach provides a generalizable framework that can be extended to other bioactive peptide discovery studies, offering an innovative solution for bioactive peptide mining.
format Article
id doaj-art-1003da7f4b1640bca1ac0cdae32ca660
institution Kabale University
issn 1741-7007
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series BMC Biology
spelling doaj-art-1003da7f4b1640bca1ac0cdae32ca6602025-08-20T03:46:27ZengBMCBMC Biology1741-70072025-07-0123111410.1186/s12915-025-02312-wBertADP: a fine-tuned protein language model for anti-diabetic peptide predictionXueqin Xie0Changchun Wu1Yixuan Qi2Shanghua Liu3Jian Huang4Hao Lyu5Fuying Dao6Hao Lin7The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of ChinaThe Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of ChinaThe Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of ChinaThe Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of ChinaThe Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of ChinaThe Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of ChinaSchool of Biological Sciences, Nanyang Technological UniversityThe Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of ChinaAbstract Background Diabetes is a global metabolic disease that urgently calls for the development of new and effective therapeutic agents. Anti-diabetic peptides (ADPs) have emerged as a research hotspot due to their therapeutic potential and natural safety, representing a promising class of functional peptides for diabetic management. However, conventional computational approaches for ADPs prediction mainly rely on manually extracted sequence features. These methods often lack generalizability and perform poorly on short peptides, thereby hindering effective ADPs discovery. Results In this study, we introduce a fine-tuning strategy of large-scale pre-trained protein language models (PLMs) for ADPs prediction, enabling automated extraction of discriminative semantic representations. We established the most comprehensive ADPs dataset to date, comprising 899 rigorously curated non-redundant ADPs and 67 newly collected potential candidates. Based on three model construction strategies, we developed 11 candidate models. Among them, BertADP (a fine-tuned ProtBert model) demonstrated superior performance in the independent test set, outperforming existing ADPs prediction tools with an overall accuracy of 0.955, sensitivity of 1.000, and specificity of 0.910. Notably, BertADP exhibited remarkable sequence length adaptability, maintaining stable performance across both standard and short peptide sequences. Conclusions BertADP represents the first PLMs-based intelligent prediction tool for ADPs, whose exceptional identification capability will significantly accelerate anti-diabetic drug development and facilitate personalized therapeutic strategies, thereby enhancing precision diabetes management. Furthermore, the proposed approach provides a generalizable framework that can be extended to other bioactive peptide discovery studies, offering an innovative solution for bioactive peptide mining.https://doi.org/10.1186/s12915-025-02312-wAnti-diabetic peptidesProtein language modelsFine-tuningBioactive peptide predictionDeep learning
spellingShingle Xueqin Xie
Changchun Wu
Yixuan Qi
Shanghua Liu
Jian Huang
Hao Lyu
Fuying Dao
Hao Lin
BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction
BMC Biology
Anti-diabetic peptides
Protein language models
Fine-tuning
Bioactive peptide prediction
Deep learning
title BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction
title_full BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction
title_fullStr BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction
title_full_unstemmed BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction
title_short BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction
title_sort bertadp a fine tuned protein language model for anti diabetic peptide prediction
topic Anti-diabetic peptides
Protein language models
Fine-tuning
Bioactive peptide prediction
Deep learning
url https://doi.org/10.1186/s12915-025-02312-w
work_keys_str_mv AT xueqinxie bertadpafinetunedproteinlanguagemodelforantidiabeticpeptideprediction
AT changchunwu bertadpafinetunedproteinlanguagemodelforantidiabeticpeptideprediction
AT yixuanqi bertadpafinetunedproteinlanguagemodelforantidiabeticpeptideprediction
AT shanghualiu bertadpafinetunedproteinlanguagemodelforantidiabeticpeptideprediction
AT jianhuang bertadpafinetunedproteinlanguagemodelforantidiabeticpeptideprediction
AT haolyu bertadpafinetunedproteinlanguagemodelforantidiabeticpeptideprediction
AT fuyingdao bertadpafinetunedproteinlanguagemodelforantidiabeticpeptideprediction
AT haolin bertadpafinetunedproteinlanguagemodelforantidiabeticpeptideprediction