Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery

The naïve human antibody repertoire has theoretical access to an estimated > 1015 antibodies. Identifying subsets of this prohibitively large space where therapeutically relevant antibodies may be found is useful for development of these agents. It was previously demonstrated that, despite the im...

Full description

Saved in:
Bibliographic Details
Main Authors: Pawel Dudzic, Dawid Chomicz, Jarosław Kończak, Tadeusz Satława, Bartosz Janusz, Sonia Wrobel, Tomasz Gawłowski, Igor Jaszczyszyn, Weronika Bielska, Samuel Demharter, Roberto Spreafico, Lukas Schulte, Kyle Martin, Stephen R. Comeau, Konrad Krawczyk
Format: Article
Language:English
Published: Taylor & Francis Group 2024-12-01
Series:mAbs
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/19420862.2024.2361928
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832576627456868352
author Pawel Dudzic
Dawid Chomicz
Jarosław Kończak
Tadeusz Satława
Bartosz Janusz
Sonia Wrobel
Tomasz Gawłowski
Igor Jaszczyszyn
Weronika Bielska
Samuel Demharter
Roberto Spreafico
Lukas Schulte
Kyle Martin
Stephen R. Comeau
Konrad Krawczyk
author_facet Pawel Dudzic
Dawid Chomicz
Jarosław Kończak
Tadeusz Satława
Bartosz Janusz
Sonia Wrobel
Tomasz Gawłowski
Igor Jaszczyszyn
Weronika Bielska
Samuel Demharter
Roberto Spreafico
Lukas Schulte
Kyle Martin
Stephen R. Comeau
Konrad Krawczyk
author_sort Pawel Dudzic
collection DOAJ
description The naïve human antibody repertoire has theoretical access to an estimated > 1015 antibodies. Identifying subsets of this prohibitively large space where therapeutically relevant antibodies may be found is useful for development of these agents. It was previously demonstrated that, despite the immense sequence space, different individuals can produce the same antibodies. It was also shown that therapeutic antibodies, which typically follow seemingly unnatural development processes, can arise independently naturally. To check for biases in how the sequence space is explored, we data mined public repositories to identify 220 bioprojects with a combined seven billion reads. Of these, we created a subset of human bioprojects that we make available as the AbNGS database (https://naturalantibody.com/ngs/). AbNGS contains 135 bioprojects with four billion productive human heavy variable region sequences and 385 million unique complementarity-determining region (CDR)-H3s. We find that 270,000 (0.07% of 385 million) unique CDR-H3s are highly public in that they occur in at least five of 135 bioprojects. Of 700 unique therapeutic CDR-H3, a total of 6% has direct matches in the small set of 270,000. This observation extends to a match between CDR-H3 and V-gene call as well. Thus, the subspace of shared (‘public’) CDR-H3s shows utility for serving as a starting point for therapeutic antibody design.
format Article
id doaj-art-7c74ae2f3b3c4564ab5b22c1be596002
institution Kabale University
issn 1942-0862
1942-0870
language English
publishDate 2024-12-01
publisher Taylor & Francis Group
record_format Article
series mAbs
spelling doaj-art-7c74ae2f3b3c4564ab5b22c1be5960022025-01-31T04:19:37ZengTaylor & Francis GroupmAbs1942-08621942-08702024-12-0116110.1080/19420862.2024.2361928Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discoveryPawel Dudzic0Dawid Chomicz1Jarosław Kończak2Tadeusz Satława3Bartosz Janusz4Sonia Wrobel5Tomasz Gawłowski6Igor Jaszczyszyn7Weronika Bielska8Samuel Demharter9Roberto Spreafico10Lukas Schulte11Kyle Martin12Stephen R. Comeau13Konrad Krawczyk14NaturalAntibody, Szczecin, PolandNaturalAntibody, Szczecin, PolandNaturalAntibody, Szczecin, PolandNaturalAntibody, Szczecin, PolandNaturalAntibody, Szczecin, PolandNaturalAntibody, Szczecin, PolandNaturalAntibody, Szczecin, PolandNaturalAntibody, Szczecin, PolandNaturalAntibody, Szczecin, PolandDiscovery Data Science, Genmab, Copenhagen, DenmarkDiscovery Data Science, Genmab, Utrecht, The NetherlandsGlobal Computational Biology & Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riß, GermanyBiotherapeutics Discovery, Boehringer Ingelheim, Ridgefield, CT, USABiotherapeutics Discovery, Boehringer Ingelheim, Ridgefield, CT, USANaturalAntibody, Szczecin, PolandThe naïve human antibody repertoire has theoretical access to an estimated > 1015 antibodies. Identifying subsets of this prohibitively large space where therapeutically relevant antibodies may be found is useful for development of these agents. It was previously demonstrated that, despite the immense sequence space, different individuals can produce the same antibodies. It was also shown that therapeutic antibodies, which typically follow seemingly unnatural development processes, can arise independently naturally. To check for biases in how the sequence space is explored, we data mined public repositories to identify 220 bioprojects with a combined seven billion reads. Of these, we created a subset of human bioprojects that we make available as the AbNGS database (https://naturalantibody.com/ngs/). AbNGS contains 135 bioprojects with four billion productive human heavy variable region sequences and 385 million unique complementarity-determining region (CDR)-H3s. We find that 270,000 (0.07% of 385 million) unique CDR-H3s are highly public in that they occur in at least five of 135 bioprojects. Of 700 unique therapeutic CDR-H3, a total of 6% has direct matches in the small set of 270,000. This observation extends to a match between CDR-H3 and V-gene call as well. Thus, the subspace of shared (‘public’) CDR-H3s shows utility for serving as a starting point for therapeutic antibody design.https://www.tandfonline.com/doi/10.1080/19420862.2024.2361928CDR-H3databaserepertoire
spellingShingle Pawel Dudzic
Dawid Chomicz
Jarosław Kończak
Tadeusz Satława
Bartosz Janusz
Sonia Wrobel
Tomasz Gawłowski
Igor Jaszczyszyn
Weronika Bielska
Samuel Demharter
Roberto Spreafico
Lukas Schulte
Kyle Martin
Stephen R. Comeau
Konrad Krawczyk
Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery
mAbs
CDR-H3
database
repertoire
title Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery
title_full Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery
title_fullStr Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery
title_full_unstemmed Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery
title_short Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery
title_sort large scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery
topic CDR-H3
database
repertoire
url https://www.tandfonline.com/doi/10.1080/19420862.2024.2361928
work_keys_str_mv AT paweldudzic largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT dawidchomicz largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT jarosławkonczak largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT tadeuszsatława largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT bartoszjanusz largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT soniawrobel largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT tomaszgawłowski largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT igorjaszczyszyn largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT weronikabielska largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT samueldemharter largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT robertospreafico largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT lukasschulte largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT kylemartin largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT stephenrcomeau largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery
AT konradkrawczyk largescaledataminingoffourbillionhumanantibodyvariableregionsrevealsconvergencebetweentherapeuticandnaturalantibodiesthatconstrainssearchspaceforbiologicsdrugdiscovery