Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data

Abstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this stud...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shunichi Kosugi, Chikashi Terao
Format:	Article
Language:	English
Published:	Nature Publishing Group 2024-04-01
Series:	Human Genome Variation
Online Access:	https://doi.org/10.1038/s41439-024-00276-x
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832594878968627200
author	Shunichi Kosugi Chikashi Terao
author_facet	Shunichi Kosugi Chikashi Terao
author_sort	Shunichi Kosugi
collection	DOAJ
description	Abstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
format	Article
id	doaj-art-09af9dd84c9a408aa4e836fb2e3d9764
institution	Kabale University
issn	2054-345X
language	English
publishDate	2024-04-01
publisher	Nature Publishing Group
record_format	Article
series	Human Genome Variation
spelling	doaj-art-09af9dd84c9a408aa4e836fb2e3d97642025-01-19T12:15:40ZengNature Publishing GroupHuman Genome Variation2054-345X2024-04-0111111010.1038/s41439-024-00276-xComparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing dataShunichi Kosugi0Chikashi Terao1Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science ResearchLaboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical SciencesAbstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.https://doi.org/10.1038/s41439-024-00276-x
spellingShingle	Shunichi Kosugi Chikashi Terao Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data Human Genome Variation
title	Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_full	Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_fullStr	Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_full_unstemmed	Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_short	Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
title_sort	comparative evaluation of snvs indels and structural variations detected with short and long read sequencing data
url	https://doi.org/10.1038/s41439-024-00276-x
work_keys_str_mv	AT shunichikosugi comparativeevaluationofsnvsindelsandstructuralvariationsdetectedwithshortandlongreadsequencingdata AT chikashiterao comparativeevaluationofsnvsindelsandstructuralvariationsdetectedwithshortandlongreadsequencingdata

Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data

Similar Items