Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets

Abstract Accurate variant calling from whole-exome sequencing (WES) data is vital for understanding genetic diseases. Recently, commercial variant calling software have emerged that do not require bioinformatics or programming expertise, hence enabling independent analysis of WES data by smaller lab...

Full description

Saved in:
Bibliographic Details
Main Authors: Matthew Wong, Bryan Liew, Melissa Hum, Ning Yuan Lee, Ann S. G. Lee
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-97047-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850176927724208128
author Matthew Wong
Bryan Liew
Melissa Hum
Ning Yuan Lee
Ann S. G. Lee
author_facet Matthew Wong
Bryan Liew
Melissa Hum
Ning Yuan Lee
Ann S. G. Lee
author_sort Matthew Wong
collection DOAJ
description Abstract Accurate variant calling from whole-exome sequencing (WES) data is vital for understanding genetic diseases. Recently, commercial variant calling software have emerged that do not require bioinformatics or programming expertise, hence enabling independent analysis of WES data by smaller laboratories and clinics and circumventing the need for dedicated and expensive computers and bioinformatics staff. This study benchmarks four non-programming variant calling software namely, Illumina BaseSpace Sequence Hub (Illumina), CLC Genomics Workbench (CLC), Partek Flow, and Varsome Clinical, for the variant calling of three Genome in a Bottle (GIAB) whole-exome sequencing datasets (HG001, HG002 and HG003). Following alignment of sequence reads to the human reference genome GRCh38, variants were compared against high-confidence regions from GIAB datasets and assessed using the Variant Calling Assessment Tool (VCAT). Illumina’s DRAGEN Enrichment achieved the highest precision and recall scores for single nucleotide variant (SNV) and insertions/deletion (indel) calling at over 99% for SNVs and 96% for indels while Partek Flow using unionised variant calls from Freebayes and Samtools had the lowest indel calling performance. Illumina had the highest true positives (TP) variant counts for all samples and all four software shared 98–99% similarity of TP variants. Run times were shortest for CLC and Illumina ranging from 6 to 25 min and 29 to 36 min respectively, while Partek Flow took the longest (3.6 to 29.7 h). This study provides information for clinicians and biologists without programming expertise in their selection of software for variant analysis that balance accuracy, sensitivity, and runtime.
format Article
id doaj-art-b2f08567dec64da3bfa6e6a0ee0616de
institution OA Journals
issn 2045-2322
language English
publishDate 2025-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-b2f08567dec64da3bfa6e6a0ee0616de2025-08-20T02:19:07ZengNature PortfolioScientific Reports2045-23222025-04-0115111210.1038/s41598-025-97047-7Benchmarking of variant calling software for whole-exome sequencing using gold standard datasetsMatthew Wong0Bryan Liew1Melissa Hum2Ning Yuan Lee3Ann S. G. Lee4Division of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeDivision of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeDivision of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeDivision of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeDivision of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeAbstract Accurate variant calling from whole-exome sequencing (WES) data is vital for understanding genetic diseases. Recently, commercial variant calling software have emerged that do not require bioinformatics or programming expertise, hence enabling independent analysis of WES data by smaller laboratories and clinics and circumventing the need for dedicated and expensive computers and bioinformatics staff. This study benchmarks four non-programming variant calling software namely, Illumina BaseSpace Sequence Hub (Illumina), CLC Genomics Workbench (CLC), Partek Flow, and Varsome Clinical, for the variant calling of three Genome in a Bottle (GIAB) whole-exome sequencing datasets (HG001, HG002 and HG003). Following alignment of sequence reads to the human reference genome GRCh38, variants were compared against high-confidence regions from GIAB datasets and assessed using the Variant Calling Assessment Tool (VCAT). Illumina’s DRAGEN Enrichment achieved the highest precision and recall scores for single nucleotide variant (SNV) and insertions/deletion (indel) calling at over 99% for SNVs and 96% for indels while Partek Flow using unionised variant calls from Freebayes and Samtools had the lowest indel calling performance. Illumina had the highest true positives (TP) variant counts for all samples and all four software shared 98–99% similarity of TP variants. Run times were shortest for CLC and Illumina ranging from 6 to 25 min and 29 to 36 min respectively, while Partek Flow took the longest (3.6 to 29.7 h). This study provides information for clinicians and biologists without programming expertise in their selection of software for variant analysis that balance accuracy, sensitivity, and runtime.https://doi.org/10.1038/s41598-025-97047-7BenchmarkingVariant callingWhole-exome sequencingNo-programming softwareGIAB.
spellingShingle Matthew Wong
Bryan Liew
Melissa Hum
Ning Yuan Lee
Ann S. G. Lee
Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
Scientific Reports
Benchmarking
Variant calling
Whole-exome sequencing
No-programming software
GIAB.
title Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_full Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_fullStr Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_full_unstemmed Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_short Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_sort benchmarking of variant calling software for whole exome sequencing using gold standard datasets
topic Benchmarking
Variant calling
Whole-exome sequencing
No-programming software
GIAB.
url https://doi.org/10.1038/s41598-025-97047-7
work_keys_str_mv AT matthewwong benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets
AT bryanliew benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets
AT melissahum benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets
AT ningyuanlee benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets
AT annsglee benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets