Using de novo assembly to identify structural variation of eight complex immune system gene regions.

Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the com...

Full description

Saved in:
Bibliographic Details
Main Authors: Jia-Yuan Zhang, Hannah Roberts, David S C Flores, Antony J Cutler, Andrew C Brown, Justin P Whalley, Olga Mielczarek, David Buck, Helen Lockstone, Barbara Xella, Karen Oliver, Craig Corton, Emma Betteridge, Rachael Bashford-Rogers, Julian C Knight, John A Todd, Gavin Band
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-08-01
Series:PLoS Computational Biology
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8363018
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849337443932700672
author Jia-Yuan Zhang
Hannah Roberts
David S C Flores
Antony J Cutler
Andrew C Brown
Justin P Whalley
Olga Mielczarek
David Buck
Helen Lockstone
Barbara Xella
Karen Oliver
Craig Corton
Emma Betteridge
Rachael Bashford-Rogers
Julian C Knight
John A Todd
Gavin Band
author_facet Jia-Yuan Zhang
Hannah Roberts
David S C Flores
Antony J Cutler
Andrew C Brown
Justin P Whalley
Olga Mielczarek
David Buck
Helen Lockstone
Barbara Xella
Karen Oliver
Craig Corton
Emma Betteridge
Rachael Bashford-Rogers
Julian C Knight
John A Todd
Gavin Band
author_sort Jia-Yuan Zhang
collection DOAJ
description Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the complexity of these loci, which contain extensive regions of paralogy, segmental duplication and high copy-number repeats, but recent progress in long-read sequencing and optical mapping techniques suggests this problem may now be tractable. Here we assess this by using long-read sequencing platforms from PacBio and Oxford Nanopore, supplemented with short-read sequencing and Bionano optical mapping, to sequence DNA extracted from CD14+ monocytes and peripheral blood mononuclear cells from a single European individual identified as HV31. We use this data to build a de novo assembly of eight genomic regions encoding four key components of the immune system, namely the human leukocyte antigen, immunoglobulins, T cell receptors, and killer-cell immunoglobulin-like receptors. Validation of our assembly using k-mer based and alignment approaches suggests that it has high accuracy, with estimated base-level error rates below 1 in 10 kb, although we identify a small number of remaining structural errors. We use the assembly to identify heterozygous and homozygous structural variation in comparison to GRCh38. Despite analyzing only a single individual, we find multiple large structural variants affecting core genes at all three immunoglobulin regions and at two of the three T cell receptor regions. Several of these variants are not accurately callable using current algorithms, implying that further methodological improvements are needed. Our results demonstrate that assessing haplotype variation in these regions is possible given sufficiently accurate long-read and associated data. Continued reductions in the cost of these technologies will enable application of these methods to larger samples and provide a broader catalogue of germline structural variation at these loci, an important step toward making these regions accessible to large-scale genetic association studies.
format Article
id doaj-art-bea2427071be4bdeb11fe0c7dccabc1c
institution Kabale University
issn 1553-734X
1553-7358
language English
publishDate 2021-08-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-bea2427071be4bdeb11fe0c7dccabc1c2025-08-20T03:44:40ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582021-08-01178e100925410.1371/journal.pcbi.1009254Using de novo assembly to identify structural variation of eight complex immune system gene regions.Jia-Yuan ZhangHannah RobertsDavid S C FloresAntony J CutlerAndrew C BrownJustin P WhalleyOlga MielczarekDavid BuckHelen LockstoneBarbara XellaKaren OliverCraig CortonEmma BetteridgeRachael Bashford-RogersJulian C KnightJohn A ToddGavin BandDriven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the complexity of these loci, which contain extensive regions of paralogy, segmental duplication and high copy-number repeats, but recent progress in long-read sequencing and optical mapping techniques suggests this problem may now be tractable. Here we assess this by using long-read sequencing platforms from PacBio and Oxford Nanopore, supplemented with short-read sequencing and Bionano optical mapping, to sequence DNA extracted from CD14+ monocytes and peripheral blood mononuclear cells from a single European individual identified as HV31. We use this data to build a de novo assembly of eight genomic regions encoding four key components of the immune system, namely the human leukocyte antigen, immunoglobulins, T cell receptors, and killer-cell immunoglobulin-like receptors. Validation of our assembly using k-mer based and alignment approaches suggests that it has high accuracy, with estimated base-level error rates below 1 in 10 kb, although we identify a small number of remaining structural errors. We use the assembly to identify heterozygous and homozygous structural variation in comparison to GRCh38. Despite analyzing only a single individual, we find multiple large structural variants affecting core genes at all three immunoglobulin regions and at two of the three T cell receptor regions. Several of these variants are not accurately callable using current algorithms, implying that further methodological improvements are needed. Our results demonstrate that assessing haplotype variation in these regions is possible given sufficiently accurate long-read and associated data. Continued reductions in the cost of these technologies will enable application of these methods to larger samples and provide a broader catalogue of germline structural variation at these loci, an important step toward making these regions accessible to large-scale genetic association studies.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8363018
spellingShingle Jia-Yuan Zhang
Hannah Roberts
David S C Flores
Antony J Cutler
Andrew C Brown
Justin P Whalley
Olga Mielczarek
David Buck
Helen Lockstone
Barbara Xella
Karen Oliver
Craig Corton
Emma Betteridge
Rachael Bashford-Rogers
Julian C Knight
John A Todd
Gavin Band
Using de novo assembly to identify structural variation of eight complex immune system gene regions.
PLoS Computational Biology
title Using de novo assembly to identify structural variation of eight complex immune system gene regions.
title_full Using de novo assembly to identify structural variation of eight complex immune system gene regions.
title_fullStr Using de novo assembly to identify structural variation of eight complex immune system gene regions.
title_full_unstemmed Using de novo assembly to identify structural variation of eight complex immune system gene regions.
title_short Using de novo assembly to identify structural variation of eight complex immune system gene regions.
title_sort using de novo assembly to identify structural variation of eight complex immune system gene regions
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8363018
work_keys_str_mv AT jiayuanzhang usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT hannahroberts usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT davidscflores usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT antonyjcutler usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT andrewcbrown usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT justinpwhalley usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT olgamielczarek usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT davidbuck usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT helenlockstone usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT barbaraxella usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT karenoliver usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT craigcorton usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT emmabetteridge usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT rachaelbashfordrogers usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT juliancknight usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT johnatodd usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT gavinband usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions