NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model
Abstract Background Accurate identification of translation initiation sites is essential for the proper translation of mRNA into functional proteins. In eukaryotes, the choice of the translation initiation site is influenced by multiple factors, including its proximity to the 5 $$^\prime $$ end and...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-08-01
|
| Series: | BMC Bioinformatics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12859-025-06220-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849225813349629952 |
|---|---|
| author | Line Sandvad Nielsen Anders Gorm Pedersen Ole Winther Henrik Nielsen |
| author_facet | Line Sandvad Nielsen Anders Gorm Pedersen Ole Winther Henrik Nielsen |
| author_sort | Line Sandvad Nielsen |
| collection | DOAJ |
| description | Abstract Background Accurate identification of translation initiation sites is essential for the proper translation of mRNA into functional proteins. In eukaryotes, the choice of the translation initiation site is influenced by multiple factors, including its proximity to the 5 $$^\prime $$ end and the local start codon context. Translation initiation sites mark the transition from non-coding to coding regions. This fact motivates the expectation that the upstream sequence, if translated, would assemble a nonsensical order of amino acids, while the downstream sequence would correspond to the structured beginning of a protein. This distinction suggests potential for predicting translation initiation sites using a protein language model. Results We present NetStart 2.0, a deep learning-based model that integrates the ESM-2 protein language model with the local sequence context to predict translation initiation sites across a broad range of eukaryotic species. NetStart 2.0 was trained as a single model across multiple species, and despite the broad phylogenetic diversity represented in the training data, it consistently relied on features marking the transition from non-coding to coding regions. Conclusion By leveraging “protein-ness”, NetStart 2.0 achieves state-of-the-art performance in predicting translation initiation sites across a diverse range of eukaryotic species. This success underscores the potential of protein language models to bridge transcript- and peptide-level information in complex biological prediction tasks. The NetStart 2.0 webserver is available at: https://services.healthtech.dtu.dk/services/NetStart-2.0/ . |
| format | Article |
| id | doaj-art-07f2ff9fadb54390a9b54bca9529dd0e |
| institution | Kabale University |
| issn | 1471-2105 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Bioinformatics |
| spelling | doaj-art-07f2ff9fadb54390a9b54bca9529dd0e2025-08-24T11:54:34ZengBMCBMC Bioinformatics1471-21052025-08-0126112210.1186/s12859-025-06220-2NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language modelLine Sandvad Nielsen0Anders Gorm Pedersen1Ole Winther2Henrik Nielsen3Section for Computational and RNA Biology, Department of Biology, University of CopenhagenSection for Bioinformatics, Department of Health Technology, Technical University of DenmarkSection for Computational and RNA Biology, Department of Biology, University of CopenhagenSection for Bioinformatics, Department of Health Technology, Technical University of DenmarkAbstract Background Accurate identification of translation initiation sites is essential for the proper translation of mRNA into functional proteins. In eukaryotes, the choice of the translation initiation site is influenced by multiple factors, including its proximity to the 5 $$^\prime $$ end and the local start codon context. Translation initiation sites mark the transition from non-coding to coding regions. This fact motivates the expectation that the upstream sequence, if translated, would assemble a nonsensical order of amino acids, while the downstream sequence would correspond to the structured beginning of a protein. This distinction suggests potential for predicting translation initiation sites using a protein language model. Results We present NetStart 2.0, a deep learning-based model that integrates the ESM-2 protein language model with the local sequence context to predict translation initiation sites across a broad range of eukaryotic species. NetStart 2.0 was trained as a single model across multiple species, and despite the broad phylogenetic diversity represented in the training data, it consistently relied on features marking the transition from non-coding to coding regions. Conclusion By leveraging “protein-ness”, NetStart 2.0 achieves state-of-the-art performance in predicting translation initiation sites across a diverse range of eukaryotic species. This success underscores the potential of protein language models to bridge transcript- and peptide-level information in complex biological prediction tasks. The NetStart 2.0 webserver is available at: https://services.healthtech.dtu.dk/services/NetStart-2.0/ .https://doi.org/10.1186/s12859-025-06220-2Protein language modelsTranslation initiation sitesStart codonsDeep learning“Protein-ness”Coding potential |
| spellingShingle | Line Sandvad Nielsen Anders Gorm Pedersen Ole Winther Henrik Nielsen NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model BMC Bioinformatics Protein language models Translation initiation sites Start codons Deep learning “Protein-ness” Coding potential |
| title | NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model |
| title_full | NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model |
| title_fullStr | NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model |
| title_full_unstemmed | NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model |
| title_short | NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model |
| title_sort | netstart 2 0 prediction of eukaryotic translation initiation sites using a protein language model |
| topic | Protein language models Translation initiation sites Start codons Deep learning “Protein-ness” Coding potential |
| url | https://doi.org/10.1186/s12859-025-06220-2 |
| work_keys_str_mv | AT linesandvadnielsen netstart20predictionofeukaryotictranslationinitiationsitesusingaproteinlanguagemodel AT andersgormpedersen netstart20predictionofeukaryotictranslationinitiationsitesusingaproteinlanguagemodel AT olewinther netstart20predictionofeukaryotictranslationinitiationsitesusingaproteinlanguagemodel AT henriknielsen netstart20predictionofeukaryotictranslationinitiationsitesusingaproteinlanguagemodel |