HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly

Background. The rapid advancement of sequencing technologies has made it possible to regularly produce millions of high-quality reads from the DNA samples in the sequencing laboratories. To this end, the de Bruijn graph is a popular data structure in the genome assembly literature for efficient repr...

Full description

Saved in:
Bibliographic Details
Main Authors: Md Mahfuzer Rahman, Ratul Sharker, Sajib Biswas, M. Sohel Rahman
Format: Article
Language:English
Published: Wiley 2017-01-01
Series:International Journal of Genomics
Online Access:http://dx.doi.org/10.1155/2017/6120980
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832563347881459712
author Md Mahfuzer Rahman
Ratul Sharker
Sajib Biswas
M. Sohel Rahman
author_facet Md Mahfuzer Rahman
Ratul Sharker
Sajib Biswas
M. Sohel Rahman
author_sort Md Mahfuzer Rahman
collection DOAJ
description Background. The rapid advancement of sequencing technologies has made it possible to regularly produce millions of high-quality reads from the DNA samples in the sequencing laboratories. To this end, the de Bruijn graph is a popular data structure in the genome assembly literature for efficient representation and processing of data. Due to the number of nodes in a de Bruijn graph, the main barrier here is the memory and runtime. Therefore, this area has received significant attention in contemporary literature. Results. In this paper, we present an approach called HaVec that attempts to achieve a balance between the memory consumption and the running time. HaVec uses a hash table along with an auxiliary vector data structure to store the de Bruijn graph thereby improving the total memory usage and the running time. A critical and noteworthy feature of HaVec is that it exhibits no false positive error. Conclusions. In general, the graph construction procedure takes the major share of the time involved in an assembly process. HaVec can be seen as a significant advancement in this aspect. We anticipate that HaVec will be extremely useful in the de Bruijn graph-based genome assembly.
format Article
id doaj-art-2bc720adde8e449ab55686f5bbfb4fb1
institution Kabale University
issn 2314-436X
2314-4378
language English
publishDate 2017-01-01
publisher Wiley
record_format Article
series International Journal of Genomics
spelling doaj-art-2bc720adde8e449ab55686f5bbfb4fb12025-02-03T01:20:23ZengWileyInternational Journal of Genomics2314-436X2314-43782017-01-01201710.1155/2017/61209806120980HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome AssemblyMd Mahfuzer Rahman0Ratul Sharker1Sajib Biswas2M. Sohel Rahman3Department of CSE, BUET, ECE Building West Palasi, Dhaka 1205, BangladeshDepartment of CSE, BUET, ECE Building West Palasi, Dhaka 1205, BangladeshDepartment of CSE, BUET, ECE Building West Palasi, Dhaka 1205, BangladeshDepartment of CSE, BUET, ECE Building West Palasi, Dhaka 1205, BangladeshBackground. The rapid advancement of sequencing technologies has made it possible to regularly produce millions of high-quality reads from the DNA samples in the sequencing laboratories. To this end, the de Bruijn graph is a popular data structure in the genome assembly literature for efficient representation and processing of data. Due to the number of nodes in a de Bruijn graph, the main barrier here is the memory and runtime. Therefore, this area has received significant attention in contemporary literature. Results. In this paper, we present an approach called HaVec that attempts to achieve a balance between the memory consumption and the running time. HaVec uses a hash table along with an auxiliary vector data structure to store the de Bruijn graph thereby improving the total memory usage and the running time. A critical and noteworthy feature of HaVec is that it exhibits no false positive error. Conclusions. In general, the graph construction procedure takes the major share of the time involved in an assembly process. HaVec can be seen as a significant advancement in this aspect. We anticipate that HaVec will be extremely useful in the de Bruijn graph-based genome assembly.http://dx.doi.org/10.1155/2017/6120980
spellingShingle Md Mahfuzer Rahman
Ratul Sharker
Sajib Biswas
M. Sohel Rahman
HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
International Journal of Genomics
title HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_full HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_fullStr HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_full_unstemmed HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_short HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_sort havec an efficient de bruijn graph construction algorithm for genome assembly
url http://dx.doi.org/10.1155/2017/6120980
work_keys_str_mv AT mdmahfuzerrahman havecanefficientdebruijngraphconstructionalgorithmforgenomeassembly
AT ratulsharker havecanefficientdebruijngraphconstructionalgorithmforgenomeassembly
AT sajibbiswas havecanefficientdebruijngraphconstructionalgorithmforgenomeassembly
AT msohelrahman havecanefficientdebruijngraphconstructionalgorithmforgenomeassembly