π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing

Abstract Peptide sequencing via tandem mass spectrometry (MS/MS) is essential in proteomics. Unlike traditional database searches, deep learning excels at de novo peptide sequencing, even for peptides missing from existing databases. Current deep learning models often rely on autoregressive generati...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiang Zhang, Tianze Ling, Zhi Jin, Sheng Xu, Zhiqiang Gao, Boyan Sun, Zijie Qiu, Jiaqi Wei, Nanqing Dong, Guangshuai Wang, Guibin Wang, Leyuan Li, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Fuchu He, Wanli Ouyang, Cheng Chang, Siqi Sun
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-55021-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850048768021364736
author Xiang Zhang
Tianze Ling
Zhi Jin
Sheng Xu
Zhiqiang Gao
Boyan Sun
Zijie Qiu
Jiaqi Wei
Nanqing Dong
Guangshuai Wang
Guibin Wang
Leyuan Li
Muhammad Abdul-Mageed
Laks V. S. Lakshmanan
Fuchu He
Wanli Ouyang
Cheng Chang
Siqi Sun
author_facet Xiang Zhang
Tianze Ling
Zhi Jin
Sheng Xu
Zhiqiang Gao
Boyan Sun
Zijie Qiu
Jiaqi Wei
Nanqing Dong
Guangshuai Wang
Guibin Wang
Leyuan Li
Muhammad Abdul-Mageed
Laks V. S. Lakshmanan
Fuchu He
Wanli Ouyang
Cheng Chang
Siqi Sun
author_sort Xiang Zhang
collection DOAJ
description Abstract Peptide sequencing via tandem mass spectrometry (MS/MS) is essential in proteomics. Unlike traditional database searches, deep learning excels at de novo peptide sequencing, even for peptides missing from existing databases. Current deep learning models often rely on autoregressive generation, which suffers from error accumulation and slow inference speeds. In this work, we introduce π-PrimeNovo, a non-autoregressive Transformer-based model for peptide sequencing. With our architecture design and a CUDA-enhanced decoding module for precise mass control, π-PrimeNovo achieves significantly higher accuracy and up to 89x faster inference than state-of-the-art methods, making it ideal for large-scale applications like metaproteomics. Additionally, it excels in phosphopeptide mining and detecting low-abundance post-translational modifications (PTMs), marking a substantial advance in peptide sequencing with broad potential in biological research.
format Article
id doaj-art-af31071132d349ff803c1ce41beccd15
institution DOAJ
issn 2041-1723
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-af31071132d349ff803c1ce41beccd152025-08-20T02:53:53ZengNature PortfolioNature Communications2041-17232025-01-0116111610.1038/s41467-024-55021-3π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencingXiang Zhang0Tianze Ling1Zhi Jin2Sheng Xu3Zhiqiang Gao4Boyan Sun5Zijie Qiu6Jiaqi Wei7Nanqing Dong8Guangshuai Wang9Guibin Wang10Leyuan Li11Muhammad Abdul-Mageed12Laks V. S. Lakshmanan13Fuchu He14Wanli Ouyang15Cheng Chang16Siqi Sun17Shanghai Artificial Intelligence LaboratoryTsinghua UniversityShanghai Artificial Intelligence LaboratoryShanghai Artificial Intelligence LaboratoryShanghai Artificial Intelligence LaboratoryState Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of LifeomicsShanghai Artificial Intelligence LaboratoryShanghai Artificial Intelligence LaboratoryShanghai Artificial Intelligence LaboratoryShanghai Artificial Intelligence LaboratoryState Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of LifeomicsState Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of LifeomicsUniversity of British ColumbiaUniversity of British ColumbiaState Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of LifeomicsShanghai Artificial Intelligence LaboratoryState Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of LifeomicsResearch Institute of Intelligent Complex Systems, Fudan UniversityAbstract Peptide sequencing via tandem mass spectrometry (MS/MS) is essential in proteomics. Unlike traditional database searches, deep learning excels at de novo peptide sequencing, even for peptides missing from existing databases. Current deep learning models often rely on autoregressive generation, which suffers from error accumulation and slow inference speeds. In this work, we introduce π-PrimeNovo, a non-autoregressive Transformer-based model for peptide sequencing. With our architecture design and a CUDA-enhanced decoding module for precise mass control, π-PrimeNovo achieves significantly higher accuracy and up to 89x faster inference than state-of-the-art methods, making it ideal for large-scale applications like metaproteomics. Additionally, it excels in phosphopeptide mining and detecting low-abundance post-translational modifications (PTMs), marking a substantial advance in peptide sequencing with broad potential in biological research.https://doi.org/10.1038/s41467-024-55021-3
spellingShingle Xiang Zhang
Tianze Ling
Zhi Jin
Sheng Xu
Zhiqiang Gao
Boyan Sun
Zijie Qiu
Jiaqi Wei
Nanqing Dong
Guangshuai Wang
Guibin Wang
Leyuan Li
Muhammad Abdul-Mageed
Laks V. S. Lakshmanan
Fuchu He
Wanli Ouyang
Cheng Chang
Siqi Sun
π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing
Nature Communications
title π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing
title_full π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing
title_fullStr π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing
title_full_unstemmed π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing
title_short π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing
title_sort π primenovo an accurate and efficient non autoregressive deep learning model for de novo peptide sequencing
url https://doi.org/10.1038/s41467-024-55021-3
work_keys_str_mv AT xiangzhang pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT tianzeling pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT zhijin pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT shengxu pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT zhiqianggao pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT boyansun pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT zijieqiu pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT jiaqiwei pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT nanqingdong pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT guangshuaiwang pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT guibinwang pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT leyuanli pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT muhammadabdulmageed pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT laksvslakshmanan pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT fuchuhe pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT wanliouyang pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT chengchang pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing
AT siqisun pprimenovoanaccurateandefficientnonautoregressivedeeplearningmodelfordenovopeptidesequencing