DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis
Abstract Data-independent acquisition mass spectrometry (DIA-MS) has become increasingly pivotal in quantitative proteomics. In this study, we present DIA-BERT, a software tool that harnesses a transformer-based pre-trained artificial intelligence (AI) model for analyzing DIA proteomics data. The id...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-04-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-025-58866-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850181129309519872 |
|---|---|
| author | Zhiwei Liu Pu Liu Yingying Sun Zongxiang Nie Xiaofan Zhang Yuqi Zhang Yi Chen Tiannan Guo |
| author_facet | Zhiwei Liu Pu Liu Yingying Sun Zongxiang Nie Xiaofan Zhang Yuqi Zhang Yi Chen Tiannan Guo |
| author_sort | Zhiwei Liu |
| collection | DOAJ |
| description | Abstract Data-independent acquisition mass spectrometry (DIA-MS) has become increasingly pivotal in quantitative proteomics. In this study, we present DIA-BERT, a software tool that harnesses a transformer-based pre-trained artificial intelligence (AI) model for analyzing DIA proteomics data. The identification model was trained using over 276 million high-quality peptide precursors extracted from existing DIA-MS files, while the quantification model was trained on 34 million peptide precursors from synthetic DIA-MS files. When compared to DIA-NN, DIA-BERT demonstrated a 51% increase in protein identifications and 22% more peptide precursors on average across five human cancer sample sets (cervical cancer, pancreatic adenocarcinoma, myosarcoma, gallbladder cancer, and gastric carcinoma), achieving high quantitative accuracy. This study underscores the potential of leveraging pre-trained models and synthetic datasets to enhance the analysis of DIA proteomics. |
| format | Article |
| id | doaj-art-72a62ae943e0469992c86899413edd13 |
| institution | OA Journals |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-72a62ae943e0469992c86899413edd132025-08-20T02:17:58ZengNature PortfolioNature Communications2041-17232025-04-011611910.1038/s41467-025-58866-4DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysisZhiwei Liu0Pu Liu1Yingying Sun2Zongxiang Nie3Xiaofan Zhang4Yuqi Zhang5Yi Chen6Tiannan Guo7Affiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityWestlake Omics (Hangzhou) Biotechnology Co., Ltd.Affiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAbstract Data-independent acquisition mass spectrometry (DIA-MS) has become increasingly pivotal in quantitative proteomics. In this study, we present DIA-BERT, a software tool that harnesses a transformer-based pre-trained artificial intelligence (AI) model for analyzing DIA proteomics data. The identification model was trained using over 276 million high-quality peptide precursors extracted from existing DIA-MS files, while the quantification model was trained on 34 million peptide precursors from synthetic DIA-MS files. When compared to DIA-NN, DIA-BERT demonstrated a 51% increase in protein identifications and 22% more peptide precursors on average across five human cancer sample sets (cervical cancer, pancreatic adenocarcinoma, myosarcoma, gallbladder cancer, and gastric carcinoma), achieving high quantitative accuracy. This study underscores the potential of leveraging pre-trained models and synthetic datasets to enhance the analysis of DIA proteomics.https://doi.org/10.1038/s41467-025-58866-4 |
| spellingShingle | Zhiwei Liu Pu Liu Yingying Sun Zongxiang Nie Xiaofan Zhang Yuqi Zhang Yi Chen Tiannan Guo DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis Nature Communications |
| title | DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis |
| title_full | DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis |
| title_fullStr | DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis |
| title_full_unstemmed | DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis |
| title_short | DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis |
| title_sort | dia bert pre trained end to end transformer models for enhanced dia proteomics data analysis |
| url | https://doi.org/10.1038/s41467-025-58866-4 |
| work_keys_str_mv | AT zhiweiliu diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis AT puliu diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis AT yingyingsun diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis AT zongxiangnie diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis AT xiaofanzhang diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis AT yuqizhang diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis AT yichen diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis AT tiannanguo diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis |