Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data

Abstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this stud...

Full description

Saved in:
Bibliographic Details
Main Authors: Shunichi Kosugi, Chikashi Terao
Format: Article
Language:English
Published: Nature Publishing Group 2024-04-01
Series:Human Genome Variation
Online Access:https://doi.org/10.1038/s41439-024-00276-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594878968627200
author Shunichi Kosugi
Chikashi Terao
author_facet Shunichi Kosugi
Chikashi Terao
author_sort Shunichi Kosugi
collection DOAJ
description Abstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
format Article
id doaj-art-09af9dd84c9a408aa4e836fb2e3d9764
institution Kabale University
issn 2054-345X
language English
publishDate 2024-04-01
publisher Nature Publishing Group
record_format Article
series Human Genome Variation
spelling doaj-art-09af9dd84c9a408aa4e836fb2e3d97642025-01-19T12:15:40ZengNature Publishing GroupHuman Genome Variation2054-345X2024-04-0111111010.1038/s41439-024-00276-xComparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing dataShunichi Kosugi0Chikashi Terao1Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science ResearchLaboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical SciencesAbstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.https://doi.org/10.1038/s41439-024-00276-x
spellingShingle Shunichi Kosugi
Chikashi Terao
Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
Human Genome Variation
title Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_full Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_fullStr Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_full_unstemmed Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_short Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_sort comparative evaluation of snvs indels and structural variations detected with short and long read sequencing data
url https://doi.org/10.1038/s41439-024-00276-x
work_keys_str_mv AT shunichikosugi comparativeevaluationofsnvsindelsandstructuralvariationsdetectedwithshortandlongreadsequencingdata
AT chikashiterao comparativeevaluationofsnvsindelsandstructuralvariationsdetectedwithshortandlongreadsequencingdata