Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets

Abstract Accurate variant calling from whole-exome sequencing (WES) data is vital for understanding genetic diseases. Recently, commercial variant calling software have emerged that do not require bioinformatics or programming expertise, hence enabling independent analysis of WES data by smaller lab...

Full description

Saved in:

Bibliographic Details
Main Authors:	Matthew Wong, Bryan Liew, Melissa Hum, Ning Yuan Lee, Ann S. G. Lee
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-04-01
Series:	Scientific Reports
Subjects:	Benchmarking Variant calling Whole-exome sequencing No-programming software GIAB.
Online Access:	https://doi.org/10.1038/s41598-025-97047-7
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850176927724208128
author	Matthew Wong Bryan Liew Melissa Hum Ning Yuan Lee Ann S. G. Lee
author_facet	Matthew Wong Bryan Liew Melissa Hum Ning Yuan Lee Ann S. G. Lee
author_sort	Matthew Wong
collection	DOAJ
description	Abstract Accurate variant calling from whole-exome sequencing (WES) data is vital for understanding genetic diseases. Recently, commercial variant calling software have emerged that do not require bioinformatics or programming expertise, hence enabling independent analysis of WES data by smaller laboratories and clinics and circumventing the need for dedicated and expensive computers and bioinformatics staff. This study benchmarks four non-programming variant calling software namely, Illumina BaseSpace Sequence Hub (Illumina), CLC Genomics Workbench (CLC), Partek Flow, and Varsome Clinical, for the variant calling of three Genome in a Bottle (GIAB) whole-exome sequencing datasets (HG001, HG002 and HG003). Following alignment of sequence reads to the human reference genome GRCh38, variants were compared against high-confidence regions from GIAB datasets and assessed using the Variant Calling Assessment Tool (VCAT). Illumina’s DRAGEN Enrichment achieved the highest precision and recall scores for single nucleotide variant (SNV) and insertions/deletion (indel) calling at over 99% for SNVs and 96% for indels while Partek Flow using unionised variant calls from Freebayes and Samtools had the lowest indel calling performance. Illumina had the highest true positives (TP) variant counts for all samples and all four software shared 98–99% similarity of TP variants. Run times were shortest for CLC and Illumina ranging from 6 to 25 min and 29 to 36 min respectively, while Partek Flow took the longest (3.6 to 29.7 h). This study provides information for clinicians and biologists without programming expertise in their selection of software for variant analysis that balance accuracy, sensitivity, and runtime.
format	Article
id	doaj-art-b2f08567dec64da3bfa6e6a0ee0616de
institution	OA Journals
issn	2045-2322
language	English
publishDate	2025-04-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-b2f08567dec64da3bfa6e6a0ee0616de2025-08-20T02:19:07ZengNature PortfolioScientific Reports2045-23222025-04-0115111210.1038/s41598-025-97047-7Benchmarking of variant calling software for whole-exome sequencing using gold standard datasetsMatthew Wong0Bryan Liew1Melissa Hum2Ning Yuan Lee3Ann S. G. Lee4Division of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeDivision of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeDivision of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeDivision of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeDivision of Cellular and Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre SingaporeAbstract Accurate variant calling from whole-exome sequencing (WES) data is vital for understanding genetic diseases. Recently, commercial variant calling software have emerged that do not require bioinformatics or programming expertise, hence enabling independent analysis of WES data by smaller laboratories and clinics and circumventing the need for dedicated and expensive computers and bioinformatics staff. This study benchmarks four non-programming variant calling software namely, Illumina BaseSpace Sequence Hub (Illumina), CLC Genomics Workbench (CLC), Partek Flow, and Varsome Clinical, for the variant calling of three Genome in a Bottle (GIAB) whole-exome sequencing datasets (HG001, HG002 and HG003). Following alignment of sequence reads to the human reference genome GRCh38, variants were compared against high-confidence regions from GIAB datasets and assessed using the Variant Calling Assessment Tool (VCAT). Illumina’s DRAGEN Enrichment achieved the highest precision and recall scores for single nucleotide variant (SNV) and insertions/deletion (indel) calling at over 99% for SNVs and 96% for indels while Partek Flow using unionised variant calls from Freebayes and Samtools had the lowest indel calling performance. Illumina had the highest true positives (TP) variant counts for all samples and all four software shared 98–99% similarity of TP variants. Run times were shortest for CLC and Illumina ranging from 6 to 25 min and 29 to 36 min respectively, while Partek Flow took the longest (3.6 to 29.7 h). This study provides information for clinicians and biologists without programming expertise in their selection of software for variant analysis that balance accuracy, sensitivity, and runtime.https://doi.org/10.1038/s41598-025-97047-7BenchmarkingVariant callingWhole-exome sequencingNo-programming softwareGIAB.
spellingShingle	Matthew Wong Bryan Liew Melissa Hum Ning Yuan Lee Ann S. G. Lee Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets Scientific Reports Benchmarking Variant calling Whole-exome sequencing No-programming software GIAB.
title	Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_full	Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_fullStr	Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_full_unstemmed	Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_short	Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets
title_sort	benchmarking of variant calling software for whole exome sequencing using gold standard datasets
topic	Benchmarking Variant calling Whole-exome sequencing No-programming software GIAB.
url	https://doi.org/10.1038/s41598-025-97047-7
work_keys_str_mv	AT matthewwong benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets AT bryanliew benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets AT melissahum benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets AT ningyuanlee benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets AT annsglee benchmarkingofvariantcallingsoftwareforwholeexomesequencingusinggoldstandarddatasets

Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets

Similar Items