Phylogenetic tree-based amino acid sequence generation for proteomics data analysis of unknown species

In bottom-up proteomics, selecting an appropriate protein amino acid sequence database is vital for reliable peptide identification. However, this approach excludes species with unsequenced genomes, limiting the comprehensiveness. This is a major challenge in current microbiota proteomics, a rapidly...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nobuaki Miura, Tsuyoshi Tabata, Yasushi Ishihama, Shujiro Okuda
Format:	Article
Language:	English
Published:	Elsevier 2025-01-01
Series:	Computational and Structural Biotechnology Journal
Subjects:	Amino acid sequence generation Proteomics data analysis Peptide identification Spectral matching Random branch Ion Cover Score
Online Access:	http://www.sciencedirect.com/science/article/pii/S2001037025002041
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In bottom-up proteomics, selecting an appropriate protein amino acid sequence database is vital for reliable peptide identification. However, this approach excludes species with unsequenced genomes, limiting the comprehensiveness. This is a major challenge in current microbiota proteomics, a rapidly developing field, which involves simultaneously assigning proteins to species in a sample and analyzing them using databases of protein amino acid sequences with known genomes. We aimed to develop a method to extend the database species diversity by generating protein amino acid sequences of unknown species using phylogenetic relationships among known species. To evaluate this approach, we generated the Helicobacter pylori F16 strain sequence based on the phylogenetic relationships of 29 closely related strains (excluding F16). Consequently, the percentages of peptides that matched the peptides obtained from the reference F16 strain increased by 5 %, based on sequence generation. Proteomics data analyses were performed on the F16 strain using the generated sequence database to validate peptide identification. Peptide spectral match decreased when the database was expanded using sequence generation owing to a decrease in sensitivity primarily caused by an increase in decoy hits. The decrease in identification sensitivity caused by large-scale databases could be improved by introducing a novel score, Ion Cover Score, based on spectral matching. The sequence generation method used in the present study and the introduction of scores based on spectral matching could accelerate proteomics development.
ISSN:	2001-0370

Phylogenetic tree-based amino acid sequence generation for proteomics data analysis of unknown species

Similar Items