Whole-genome sequencing and assembly with high-throughput, short-read technologies.

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHo...

Full description

Saved in:
Bibliographic Details
Main Authors: Andreas Sundquist, Mostafa Ronaghi, Haixu Tang, Pavel Pevzner, Serafim Batzoglou
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2007-05-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0000484&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850240333548355584
author Andreas Sundquist
Mostafa Ronaghi
Haixu Tang
Pavel Pevzner
Serafim Batzoglou
author_facet Andreas Sundquist
Mostafa Ronaghi
Haixu Tang
Pavel Pevzner
Serafim Batzoglou
author_sort Andreas Sundquist
collection DOAJ
description While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.
format Article
id doaj-art-ab80936f3c4a4460ae28e8fceb6878b0
institution OA Journals
issn 1932-6203
language English
publishDate 2007-05-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-ab80936f3c4a4460ae28e8fceb6878b02025-08-20T02:00:54ZengPublic Library of Science (PLoS)PLoS ONE1932-62032007-05-0125e48410.1371/journal.pone.0000484Whole-genome sequencing and assembly with high-throughput, short-read technologies.Andreas SundquistMostafa RonaghiHaixu TangPavel PevznerSerafim BatzoglouWhile recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0000484&type=printable
spellingShingle Andreas Sundquist
Mostafa Ronaghi
Haixu Tang
Pavel Pevzner
Serafim Batzoglou
Whole-genome sequencing and assembly with high-throughput, short-read technologies.
PLoS ONE
title Whole-genome sequencing and assembly with high-throughput, short-read technologies.
title_full Whole-genome sequencing and assembly with high-throughput, short-read technologies.
title_fullStr Whole-genome sequencing and assembly with high-throughput, short-read technologies.
title_full_unstemmed Whole-genome sequencing and assembly with high-throughput, short-read technologies.
title_short Whole-genome sequencing and assembly with high-throughput, short-read technologies.
title_sort whole genome sequencing and assembly with high throughput short read technologies
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0000484&type=printable
work_keys_str_mv AT andreassundquist wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies
AT mostafaronaghi wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies
AT haixutang wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies
AT pavelpevzner wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies
AT serafimbatzoglou wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies