A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models
Abstract Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is b...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-11-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-024-04068-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850062120755920896 |
|---|---|
| author | Bo Wen William Stafford Noble |
| author_facet | Bo Wen William Stafford Noble |
| author_sort | Bo Wen |
| collection | DOAJ |
| description | Abstract Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is based on a previously described benchmark but has been re-processed to ensure consistent data quality and enforce separation of training and test peptides. |
| format | Article |
| id | doaj-art-48c4b8cf1b9a4b65bf8b3e7c239efaad |
| institution | DOAJ |
| issn | 2052-4463 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Data |
| spelling | doaj-art-48c4b8cf1b9a4b65bf8b3e7c239efaad2025-08-20T02:50:00ZengNature PortfolioScientific Data2052-44632024-11-011111510.1038/s41597-024-04068-4A multi-species benchmark for training and validating mass spectrometry proteomics machine learning modelsBo Wen0William Stafford Noble1Department of Genome Sciences, University of WashingtonDepartment of Genome Sciences, University of WashingtonAbstract Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is based on a previously described benchmark but has been re-processed to ensure consistent data quality and enforce separation of training and test peptides.https://doi.org/10.1038/s41597-024-04068-4 |
| spellingShingle | Bo Wen William Stafford Noble A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models Scientific Data |
| title | A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models |
| title_full | A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models |
| title_fullStr | A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models |
| title_full_unstemmed | A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models |
| title_short | A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models |
| title_sort | multi species benchmark for training and validating mass spectrometry proteomics machine learning models |
| url | https://doi.org/10.1038/s41597-024-04068-4 |
| work_keys_str_mv | AT bowen amultispeciesbenchmarkfortrainingandvalidatingmassspectrometryproteomicsmachinelearningmodels AT williamstaffordnoble amultispeciesbenchmarkfortrainingandvalidatingmassspectrometryproteomicsmachinelearningmodels AT bowen multispeciesbenchmarkfortrainingandvalidatingmassspectrometryproteomicsmachinelearningmodels AT williamstaffordnoble multispeciesbenchmarkfortrainingandvalidatingmassspectrometryproteomicsmachinelearningmodels |