Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
Abstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this stud...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Publishing Group
2024-04-01
|
Series: | Human Genome Variation |
Online Access: | https://doi.org/10.1038/s41439-024-00276-x |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832594878968627200 |
---|---|
author | Shunichi Kosugi Chikashi Terao |
author_facet | Shunichi Kosugi Chikashi Terao |
author_sort | Shunichi Kosugi |
collection | DOAJ |
description | Abstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data. |
format | Article |
id | doaj-art-09af9dd84c9a408aa4e836fb2e3d9764 |
institution | Kabale University |
issn | 2054-345X |
language | English |
publishDate | 2024-04-01 |
publisher | Nature Publishing Group |
record_format | Article |
series | Human Genome Variation |
spelling | doaj-art-09af9dd84c9a408aa4e836fb2e3d97642025-01-19T12:15:40ZengNature Publishing GroupHuman Genome Variation2054-345X2024-04-0111111010.1038/s41439-024-00276-xComparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing dataShunichi Kosugi0Chikashi Terao1Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science ResearchLaboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical SciencesAbstract Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.https://doi.org/10.1038/s41439-024-00276-x |
spellingShingle | Shunichi Kosugi Chikashi Terao Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data Human Genome Variation |
title | Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data |
title_full | Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data |
title_fullStr | Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data |
title_full_unstemmed | Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data |
title_short | Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data |
title_sort | comparative evaluation of snvs indels and structural variations detected with short and long read sequencing data |
url | https://doi.org/10.1038/s41439-024-00276-x |
work_keys_str_mv | AT shunichikosugi comparativeevaluationofsnvsindelsandstructuralvariationsdetectedwithshortandlongreadsequencingdata AT chikashiterao comparativeevaluationofsnvsindelsandstructuralvariationsdetectedwithshortandlongreadsequencingdata |