Calculating orthologs in bacteria and Archaea: a divide and conquer approach.

Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. T...

Full description

Saved in:
Bibliographic Details
Main Authors: Mihail R Halachev, Nicholas J Loman, Mark J Pallen
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2011-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0028388&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849726096161898496
author Mihail R Halachev
Nicholas J Loman
Mark J Pallen
author_facet Mihail R Halachev
Nicholas J Loman
Mark J Pallen
author_sort Mihail R Halachev
collection DOAJ
description Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a "divide and conquer" approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.
format Article
id doaj-art-23266be325a44b2d8d81feb3f00b2a73
institution DOAJ
issn 1932-6203
language English
publishDate 2011-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-23266be325a44b2d8d81feb3f00b2a732025-08-20T03:10:18ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-01612e2838810.1371/journal.pone.0028388Calculating orthologs in bacteria and Archaea: a divide and conquer approach.Mihail R HalachevNicholas J LomanMark J PallenAmong proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a "divide and conquer" approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0028388&type=printable
spellingShingle Mihail R Halachev
Nicholas J Loman
Mark J Pallen
Calculating orthologs in bacteria and Archaea: a divide and conquer approach.
PLoS ONE
title Calculating orthologs in bacteria and Archaea: a divide and conquer approach.
title_full Calculating orthologs in bacteria and Archaea: a divide and conquer approach.
title_fullStr Calculating orthologs in bacteria and Archaea: a divide and conquer approach.
title_full_unstemmed Calculating orthologs in bacteria and Archaea: a divide and conquer approach.
title_short Calculating orthologs in bacteria and Archaea: a divide and conquer approach.
title_sort calculating orthologs in bacteria and archaea a divide and conquer approach
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0028388&type=printable
work_keys_str_mv AT mihailrhalachev calculatingorthologsinbacteriaandarchaeaadivideandconquerapproach
AT nicholasjloman calculatingorthologsinbacteriaandarchaeaadivideandconquerapproach
AT markjpallen calculatingorthologsinbacteriaandarchaeaadivideandconquerapproach