Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data.

Repertoire sequencing allows us to investigate the antibody-mediated immune response. The clustering of sequences is a crucial step in the data analysis pipeline, aiding in the identification of functionally related antibodies. The conventional clustering approach of clonotyping relies on sequence i...

Full description

Saved in:
Bibliographic Details
Main Authors: Katharina Waury, Stefan Lelieveld, Sanne Abeln, Henk-Jan van den Ham
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-05-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1013057
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850159607431823360
author Katharina Waury
Stefan Lelieveld
Sanne Abeln
Henk-Jan van den Ham
author_facet Katharina Waury
Stefan Lelieveld
Sanne Abeln
Henk-Jan van den Ham
author_sort Katharina Waury
collection DOAJ
description Repertoire sequencing allows us to investigate the antibody-mediated immune response. The clustering of sequences is a crucial step in the data analysis pipeline, aiding in the identification of functionally related antibodies. The conventional clustering approach of clonotyping relies on sequence information, particularly CDRH3 sequence identity and V/J gene usage, to group sequences into clonotypes. It has been suggested that the limitations of sequence-based approaches to identify sequence-dissimilar but functionally converged antibodies can be overcome by using structure information to group antibodies. Recent advances have made structure-based methods feasible on a repertoire level. However, so far, their performance has only been evaluated on single-antigen sets of antibodies. A comprehensive comparison of the benefits and limitations of structure-based tools on realistic and diverse repertoire data is missing. Here, we aim to explore the promise of structure-based clustering algorithms to replace or augment the standard sequence-based approach, specifically by identifying low-sequence identity groups. Two methods, SAAB+ and SPACE2, are evaluated against clonotyping. We curated a dataset of well-annotated pairs of antibodies that show high overlap in epitope residues and thus bind the same region within their respective antigen. This set of antibodies was introduced into a simulated repertoire to compare the performance of clustering approaches on a diverse antibody set. Our analysis reveals that structure-based methods do group more antibodies together compared to clonotyping. However, it also highlights the limitations associated with the need for same-length CDR regions by SPACE2. This work thoroughly compares the utility of different clustering methods and provides insights into what further steps are required to effectively use antibody structural information to group immune repertoire data.
format Article
id doaj-art-ff65fb147645465da73f240b3069e73c
institution OA Journals
issn 1553-734X
1553-7358
language English
publishDate 2025-05-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-ff65fb147645465da73f240b3069e73c2025-08-20T02:23:28ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-05-01215e101305710.1371/journal.pcbi.1013057Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data.Katharina WauryStefan LelieveldSanne AbelnHenk-Jan van den HamRepertoire sequencing allows us to investigate the antibody-mediated immune response. The clustering of sequences is a crucial step in the data analysis pipeline, aiding in the identification of functionally related antibodies. The conventional clustering approach of clonotyping relies on sequence information, particularly CDRH3 sequence identity and V/J gene usage, to group sequences into clonotypes. It has been suggested that the limitations of sequence-based approaches to identify sequence-dissimilar but functionally converged antibodies can be overcome by using structure information to group antibodies. Recent advances have made structure-based methods feasible on a repertoire level. However, so far, their performance has only been evaluated on single-antigen sets of antibodies. A comprehensive comparison of the benefits and limitations of structure-based tools on realistic and diverse repertoire data is missing. Here, we aim to explore the promise of structure-based clustering algorithms to replace or augment the standard sequence-based approach, specifically by identifying low-sequence identity groups. Two methods, SAAB+ and SPACE2, are evaluated against clonotyping. We curated a dataset of well-annotated pairs of antibodies that show high overlap in epitope residues and thus bind the same region within their respective antigen. This set of antibodies was introduced into a simulated repertoire to compare the performance of clustering approaches on a diverse antibody set. Our analysis reveals that structure-based methods do group more antibodies together compared to clonotyping. However, it also highlights the limitations associated with the need for same-length CDR regions by SPACE2. This work thoroughly compares the utility of different clustering methods and provides insights into what further steps are required to effectively use antibody structural information to group immune repertoire data.https://doi.org/10.1371/journal.pcbi.1013057
spellingShingle Katharina Waury
Stefan Lelieveld
Sanne Abeln
Henk-Jan van den Ham
Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data.
PLoS Computational Biology
title Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data.
title_full Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data.
title_fullStr Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data.
title_full_unstemmed Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data.
title_short Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data.
title_sort comparison of sequence and structure based antibody clustering approaches on simulated repertoire sequencing data
url https://doi.org/10.1371/journal.pcbi.1013057
work_keys_str_mv AT katharinawaury comparisonofsequenceandstructurebasedantibodyclusteringapproachesonsimulatedrepertoiresequencingdata
AT stefanlelieveld comparisonofsequenceandstructurebasedantibodyclusteringapproachesonsimulatedrepertoiresequencingdata
AT sanneabeln comparisonofsequenceandstructurebasedantibodyclusteringapproachesonsimulatedrepertoiresequencingdata
AT henkjanvandenham comparisonofsequenceandstructurebasedantibodyclusteringapproachesonsimulatedrepertoiresequencingdata