Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and a...

Full description

Saved in:
Bibliographic Details
Main Authors: Rodrigo Aniceto, Rene Xavier, Valeria Guimarães, Fernanda Hondo, Maristela Holanda, Maria Emilia Walter, Sérgio Lifschitz
Format: Article
Language:English
Published: Wiley 2015-01-01
Series:International Journal of Genomics
Online Access:http://dx.doi.org/10.1155/2015/502795
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832565066591895552
author Rodrigo Aniceto
Rene Xavier
Valeria Guimarães
Fernanda Hondo
Maristela Holanda
Maria Emilia Walter
Sérgio Lifschitz
author_facet Rodrigo Aniceto
Rene Xavier
Valeria Guimarães
Fernanda Hondo
Maristela Holanda
Maria Emilia Walter
Sérgio Lifschitz
author_sort Rodrigo Aniceto
collection DOAJ
description Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
format Article
id doaj-art-dcefeb6fb3a941f98f80b7e852ab7458
institution Kabale University
issn 2314-436X
2314-4378
language English
publishDate 2015-01-01
publisher Wiley
record_format Article
series International Journal of Genomics
spelling doaj-art-dcefeb6fb3a941f98f80b7e852ab74582025-02-03T01:09:35ZengWileyInternational Journal of Genomics2314-436X2314-43782015-01-01201510.1155/2015/502795502795Evaluating the Cassandra NoSQL Database Approach for Genomic Data PersistencyRodrigo Aniceto0Rene Xavier1Valeria Guimarães2Fernanda Hondo3Maristela Holanda4Maria Emilia Walter5Sérgio Lifschitz6Computer Science Department, University of Brasilia (UNB), 70910-900 Brasilia, DF, BrazilComputer Science Department, University of Brasilia (UNB), 70910-900 Brasilia, DF, BrazilComputer Science Department, University of Brasilia (UNB), 70910-900 Brasilia, DF, BrazilComputer Science Department, University of Brasilia (UNB), 70910-900 Brasilia, DF, BrazilComputer Science Department, University of Brasilia (UNB), 70910-900 Brasilia, DF, BrazilComputer Science Department, University of Brasilia (UNB), 70910-900 Brasilia, DF, BrazilInformatics Department, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), 22451-900 Rio de Janeiro, RJ, BrazilRapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.http://dx.doi.org/10.1155/2015/502795
spellingShingle Rodrigo Aniceto
Rene Xavier
Valeria Guimarães
Fernanda Hondo
Maristela Holanda
Maria Emilia Walter
Sérgio Lifschitz
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
International Journal of Genomics
title Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
title_full Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
title_fullStr Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
title_full_unstemmed Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
title_short Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
title_sort evaluating the cassandra nosql database approach for genomic data persistency
url http://dx.doi.org/10.1155/2015/502795
work_keys_str_mv AT rodrigoaniceto evaluatingthecassandranosqldatabaseapproachforgenomicdatapersistency
AT renexavier evaluatingthecassandranosqldatabaseapproachforgenomicdatapersistency
AT valeriaguimaraes evaluatingthecassandranosqldatabaseapproachforgenomicdatapersistency
AT fernandahondo evaluatingthecassandranosqldatabaseapproachforgenomicdatapersistency
AT maristelaholanda evaluatingthecassandranosqldatabaseapproachforgenomicdatapersistency
AT mariaemiliawalter evaluatingthecassandranosqldatabaseapproachforgenomicdatapersistency
AT sergiolifschitz evaluatingthecassandranosqldatabaseapproachforgenomicdatapersistency