Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms

Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware us...

Full description

Saved in:
Bibliographic Details
Main Authors: Karl R. Franke, Erin L. Crowgey
Format: Article
Language:English
Published: BioMed Central 2020-03-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gi-2020-18-1-e10.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832570088059830272
author Karl R. Franke
Erin L. Crowgey
author_facet Karl R. Franke
Erin L. Crowgey
author_sort Karl R. Franke
collection DOAJ
description Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon’s somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.
format Article
id doaj-art-a323f86f8f3a41dca865dee463a61c28
institution Kabale University
issn 2234-0742
language English
publishDate 2020-03-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-a323f86f8f3a41dca865dee463a61c282025-02-02T17:34:44ZengBioMed CentralGenomics & Informatics2234-07422020-03-0118110.5808/GI.2020.18.1.e10598Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithmsKarl R. Franke0Erin L. Crowgey1 Department of Pediatrics, Nemours Alfred I duPont Hospital for Children, Wilmington, DE 19803, USA Department of Pediatrics, Nemours Alfred I duPont Hospital for Children, Wilmington, DE 19803, USAAdvancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon’s somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.http://genominfo.org/upload/pdf/gi-2020-18-1-e10.pdfclinical genomicsgenome analysis toolkitgpusnext generation sequencingvariant detection
spellingShingle Karl R. Franke
Erin L. Crowgey
Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
Genomics & Informatics
clinical genomics
genome analysis toolkit
gpus
next generation sequencing
variant detection
title Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_full Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_fullStr Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_full_unstemmed Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_short Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_sort accelerating next generation sequencing data analysis an evaluation of optimized best practices for genome analysis toolkit algorithms
topic clinical genomics
genome analysis toolkit
gpus
next generation sequencing
variant detection
url http://genominfo.org/upload/pdf/gi-2020-18-1-e10.pdf
work_keys_str_mv AT karlrfranke acceleratingnextgenerationsequencingdataanalysisanevaluationofoptimizedbestpracticesforgenomeanalysistoolkitalgorithms
AT erinlcrowgey acceleratingnextgenerationsequencingdataanalysisanevaluationofoptimizedbestpracticesforgenomeanalysistoolkitalgorithms