DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis

Abstract Data-independent acquisition mass spectrometry (DIA-MS) has become increasingly pivotal in quantitative proteomics. In this study, we present DIA-BERT, a software tool that harnesses a transformer-based pre-trained artificial intelligence (AI) model for analyzing DIA proteomics data. The id...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhiwei Liu, Pu Liu, Yingying Sun, Zongxiang Nie, Xiaofan Zhang, Yuqi Zhang, Yi Chen, Tiannan Guo
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-58866-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850181129309519872
author Zhiwei Liu
Pu Liu
Yingying Sun
Zongxiang Nie
Xiaofan Zhang
Yuqi Zhang
Yi Chen
Tiannan Guo
author_facet Zhiwei Liu
Pu Liu
Yingying Sun
Zongxiang Nie
Xiaofan Zhang
Yuqi Zhang
Yi Chen
Tiannan Guo
author_sort Zhiwei Liu
collection DOAJ
description Abstract Data-independent acquisition mass spectrometry (DIA-MS) has become increasingly pivotal in quantitative proteomics. In this study, we present DIA-BERT, a software tool that harnesses a transformer-based pre-trained artificial intelligence (AI) model for analyzing DIA proteomics data. The identification model was trained using over 276 million high-quality peptide precursors extracted from existing DIA-MS files, while the quantification model was trained on 34 million peptide precursors from synthetic DIA-MS files. When compared to DIA-NN, DIA-BERT demonstrated a 51% increase in protein identifications and 22% more peptide precursors on average across five human cancer sample sets (cervical cancer, pancreatic adenocarcinoma, myosarcoma, gallbladder cancer, and gastric carcinoma), achieving high quantitative accuracy. This study underscores the potential of leveraging pre-trained models and synthetic datasets to enhance the analysis of DIA proteomics.
format Article
id doaj-art-72a62ae943e0469992c86899413edd13
institution OA Journals
issn 2041-1723
language English
publishDate 2025-04-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-72a62ae943e0469992c86899413edd132025-08-20T02:17:58ZengNature PortfolioNature Communications2041-17232025-04-011611910.1038/s41467-025-58866-4DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysisZhiwei Liu0Pu Liu1Yingying Sun2Zongxiang Nie3Xiaofan Zhang4Yuqi Zhang5Yi Chen6Tiannan Guo7Affiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityWestlake Omics (Hangzhou) Biotechnology Co., Ltd.Affiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAffiliated Hangzhou First People’s Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake UniversityAbstract Data-independent acquisition mass spectrometry (DIA-MS) has become increasingly pivotal in quantitative proteomics. In this study, we present DIA-BERT, a software tool that harnesses a transformer-based pre-trained artificial intelligence (AI) model for analyzing DIA proteomics data. The identification model was trained using over 276 million high-quality peptide precursors extracted from existing DIA-MS files, while the quantification model was trained on 34 million peptide precursors from synthetic DIA-MS files. When compared to DIA-NN, DIA-BERT demonstrated a 51% increase in protein identifications and 22% more peptide precursors on average across five human cancer sample sets (cervical cancer, pancreatic adenocarcinoma, myosarcoma, gallbladder cancer, and gastric carcinoma), achieving high quantitative accuracy. This study underscores the potential of leveraging pre-trained models and synthetic datasets to enhance the analysis of DIA proteomics.https://doi.org/10.1038/s41467-025-58866-4
spellingShingle Zhiwei Liu
Pu Liu
Yingying Sun
Zongxiang Nie
Xiaofan Zhang
Yuqi Zhang
Yi Chen
Tiannan Guo
DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis
Nature Communications
title DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis
title_full DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis
title_fullStr DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis
title_full_unstemmed DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis
title_short DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis
title_sort dia bert pre trained end to end transformer models for enhanced dia proteomics data analysis
url https://doi.org/10.1038/s41467-025-58866-4
work_keys_str_mv AT zhiweiliu diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis
AT puliu diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis
AT yingyingsun diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis
AT zongxiangnie diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis
AT xiaofanzhang diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis
AT yuqizhang diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis
AT yichen diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis
AT tiannanguo diabertpretrainedendtoendtransformermodelsforenhanceddiaproteomicsdataanalysis