A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.

The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for c...

Full description

Saved in:
Bibliographic Details
Main Authors: Guillaume Méric, Koji Yahara, Leonardos Mageiros, Ben Pascoe, Martin C J Maiden, Keith A Jolley, Samuel K Sheppard
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0092798&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849722165584199680
author Guillaume Méric
Koji Yahara
Leonardos Mageiros
Ben Pascoe
Martin C J Maiden
Keith A Jolley
Samuel K Sheppard
author_facet Guillaume Méric
Koji Yahara
Leonardos Mageiros
Ben Pascoe
Martin C J Maiden
Keith A Jolley
Samuel K Sheppard
author_sort Guillaume Méric
collection DOAJ
description The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation--focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of ≥ 70% identity over ≥ 50% of the locus length--aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach.
format Article
id doaj-art-4df32dad52484a5facabe5fab5ba3f8f
institution DOAJ
issn 1932-6203
language English
publishDate 2014-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-4df32dad52484a5facabe5fab5ba3f8f2025-08-20T03:11:25ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0193e9279810.1371/journal.pone.0092798A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.Guillaume MéricKoji YaharaLeonardos MageirosBen PascoeMartin C J MaidenKeith A JolleySamuel K SheppardThe increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation--focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of ≥ 70% identity over ≥ 50% of the locus length--aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0092798&type=printable
spellingShingle Guillaume Méric
Koji Yahara
Leonardos Mageiros
Ben Pascoe
Martin C J Maiden
Keith A Jolley
Samuel K Sheppard
A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.
PLoS ONE
title A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.
title_full A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.
title_fullStr A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.
title_full_unstemmed A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.
title_short A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.
title_sort reference pan genome approach to comparative bacterial genomics identification of novel epidemiological markers in pathogenic campylobacter
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0092798&type=printable
work_keys_str_mv AT guillaumemeric areferencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT kojiyahara areferencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT leonardosmageiros areferencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT benpascoe areferencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT martincjmaiden areferencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT keithajolley areferencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT samuelksheppard areferencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT guillaumemeric referencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT kojiyahara referencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT leonardosmageiros referencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT benpascoe referencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT martincjmaiden referencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT keithajolley referencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter
AT samuelksheppard referencepangenomeapproachtocomparativebacterialgenomicsidentificationofnovelepidemiologicalmarkersinpathogeniccampylobacter