Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.

Variable Number Tandem repeats (VNTRs) refer to repeating motifs of size greater than five bp. VNTRs are an important source of genetic variation, and have been associated with multiple Mendelian and complex phenotypes. However, the highly repetitive structures require reads to span the region for a...

Full description

Saved in:
Bibliographic Details
Main Authors: Sara Javadzadeh, Aaron Adamson, Jonghun Park, Se-Young Jo, Yuan-Chun Ding, Mehrdad Bakhtiari, Vikas Bansal, Susan L Neuhausen, Vineet Bafna
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-04-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1012885
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849303622203998208
author Sara Javadzadeh
Aaron Adamson
Jonghun Park
Se-Young Jo
Yuan-Chun Ding
Mehrdad Bakhtiari
Vikas Bansal
Susan L Neuhausen
Vineet Bafna
author_facet Sara Javadzadeh
Aaron Adamson
Jonghun Park
Se-Young Jo
Yuan-Chun Ding
Mehrdad Bakhtiari
Vikas Bansal
Susan L Neuhausen
Vineet Bafna
author_sort Sara Javadzadeh
collection DOAJ
description Variable Number Tandem repeats (VNTRs) refer to repeating motifs of size greater than five bp. VNTRs are an important source of genetic variation, and have been associated with multiple Mendelian and complex phenotypes. However, the highly repetitive structures require reads to span the region for accurate genotyping. Pacific Biosciences HiFi sequencing spans large regions and is highly accurate but relatively expensive. Therefore, targeted sequencing approaches coupled with long-read sequencing have been proposed to improve efficiency and throughput. In this paper, we systematically explored the trade-off between targeted and whole genome HiFi sequencing for genotyping VNTRs. We curated a set of 10 , 787 gene-proximal (G-)VNTRs, and 48 phenotype-associated (P-)VNTRs of interest. Illumina reads only spanned 46% of the G-VNTRs and 71% of P-VNTRs, motivating the use of HiFi sequencing. We performed targeted sequencing with hybridization by designing custom probes for 9,999 VNTRs and sequenced 8 samples using HiFi and Illumina sequencing, followed by adVNTR genotyping. We compared these results against HiFi whole genome sequencing (WGS) data from 28 samples in the Human Pangenome Reference Consortium (HPRC). With the targeted approach only 4,091 (41%) G-VNTRs and only 4 (8%) of P-VNTRs were spanned with at least 15 reads. A smaller subset of 3,579 (36%) G-VNTRs had higher median coverage of at least 63 spanning reads. The spanning behavior was consistent across all 8 samples. Among 5,638 VNTRs with low-coverage ( < 15), 67% were located within GC-rich regions ( > 60%). In contrast, the 40X WGS HiFi dataset spanned 98% of all VNTRs and 49 (98%) of P-VNTRs with at least 15 spanning reads, albeit with lower coverage. Spanning reads were sufficient for accurate genotyping in both cases. Our findings demonstrate that targeted sequencing provides consistently high coverage for a small subset of low-GC VNTRs, but WGS is more effective for broad and sufficient sampling of a large number of VNTRs.
format Article
id doaj-art-e7b6f993c6964734bd871769989ffdfb
institution Kabale University
issn 1553-734X
1553-7358
language English
publishDate 2025-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-e7b6f993c6964734bd871769989ffdfb2025-08-20T03:56:00ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-04-01214e101288510.1371/journal.pcbi.1012885Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.Sara JavadzadehAaron AdamsonJonghun ParkSe-Young JoYuan-Chun DingMehrdad BakhtiariVikas BansalSusan L NeuhausenVineet BafnaVariable Number Tandem repeats (VNTRs) refer to repeating motifs of size greater than five bp. VNTRs are an important source of genetic variation, and have been associated with multiple Mendelian and complex phenotypes. However, the highly repetitive structures require reads to span the region for accurate genotyping. Pacific Biosciences HiFi sequencing spans large regions and is highly accurate but relatively expensive. Therefore, targeted sequencing approaches coupled with long-read sequencing have been proposed to improve efficiency and throughput. In this paper, we systematically explored the trade-off between targeted and whole genome HiFi sequencing for genotyping VNTRs. We curated a set of 10 , 787 gene-proximal (G-)VNTRs, and 48 phenotype-associated (P-)VNTRs of interest. Illumina reads only spanned 46% of the G-VNTRs and 71% of P-VNTRs, motivating the use of HiFi sequencing. We performed targeted sequencing with hybridization by designing custom probes for 9,999 VNTRs and sequenced 8 samples using HiFi and Illumina sequencing, followed by adVNTR genotyping. We compared these results against HiFi whole genome sequencing (WGS) data from 28 samples in the Human Pangenome Reference Consortium (HPRC). With the targeted approach only 4,091 (41%) G-VNTRs and only 4 (8%) of P-VNTRs were spanned with at least 15 reads. A smaller subset of 3,579 (36%) G-VNTRs had higher median coverage of at least 63 spanning reads. The spanning behavior was consistent across all 8 samples. Among 5,638 VNTRs with low-coverage ( < 15), 67% were located within GC-rich regions ( > 60%). In contrast, the 40X WGS HiFi dataset spanned 98% of all VNTRs and 49 (98%) of P-VNTRs with at least 15 spanning reads, albeit with lower coverage. Spanning reads were sufficient for accurate genotyping in both cases. Our findings demonstrate that targeted sequencing provides consistently high coverage for a small subset of low-GC VNTRs, but WGS is more effective for broad and sufficient sampling of a large number of VNTRs.https://doi.org/10.1371/journal.pcbi.1012885
spellingShingle Sara Javadzadeh
Aaron Adamson
Jonghun Park
Se-Young Jo
Yuan-Chun Ding
Mehrdad Bakhtiari
Vikas Bansal
Susan L Neuhausen
Vineet Bafna
Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.
PLoS Computational Biology
title Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.
title_full Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.
title_fullStr Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.
title_full_unstemmed Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.
title_short Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.
title_sort analysis of targeted and whole genome sequencing of pacbio hifi reads for a comprehensive genotyping of gene proximal and phenotype associated variable number tandem repeats
url https://doi.org/10.1371/journal.pcbi.1012885
work_keys_str_mv AT sarajavadzadeh analysisoftargetedandwholegenomesequencingofpacbiohifireadsforacomprehensivegenotypingofgeneproximalandphenotypeassociatedvariablenumbertandemrepeats
AT aaronadamson analysisoftargetedandwholegenomesequencingofpacbiohifireadsforacomprehensivegenotypingofgeneproximalandphenotypeassociatedvariablenumbertandemrepeats
AT jonghunpark analysisoftargetedandwholegenomesequencingofpacbiohifireadsforacomprehensivegenotypingofgeneproximalandphenotypeassociatedvariablenumbertandemrepeats
AT seyoungjo analysisoftargetedandwholegenomesequencingofpacbiohifireadsforacomprehensivegenotypingofgeneproximalandphenotypeassociatedvariablenumbertandemrepeats
AT yuanchunding analysisoftargetedandwholegenomesequencingofpacbiohifireadsforacomprehensivegenotypingofgeneproximalandphenotypeassociatedvariablenumbertandemrepeats
AT mehrdadbakhtiari analysisoftargetedandwholegenomesequencingofpacbiohifireadsforacomprehensivegenotypingofgeneproximalandphenotypeassociatedvariablenumbertandemrepeats
AT vikasbansal analysisoftargetedandwholegenomesequencingofpacbiohifireadsforacomprehensivegenotypingofgeneproximalandphenotypeassociatedvariablenumbertandemrepeats
AT susanlneuhausen analysisoftargetedandwholegenomesequencingofpacbiohifireadsforacomprehensivegenotypingofgeneproximalandphenotypeassociatedvariablenumbertandemrepeats
AT vineetbafna analysisoftargetedandwholegenomesequencingofpacbiohifireadsforacomprehensivegenotypingofgeneproximalandphenotypeassociatedvariablenumbertandemrepeats