High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networks
Abstract Engineered T-cell receptor (eTCR) systems rely on accurately generated T-cell receptor (TCR) sequences to enhance immunotherapy predictability and efficacy. The most variable and crucial part of the TCR receptor is the CDR3 sequence region. Current methods for generating CDR3 sequences, inc...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Online Access: | https://doi.org/10.1038/s41598-025-01172-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849687936555024384 |
|---|---|
| author | Piotr Religa Michel-Edwar Mickael Norwin Kubick Jarosław Olav Horbańczuk Nikko Floretes Mariusz Sacharczuk Atanas G. Atanasov |
| author_facet | Piotr Religa Michel-Edwar Mickael Norwin Kubick Jarosław Olav Horbańczuk Nikko Floretes Mariusz Sacharczuk Atanas G. Atanasov |
| author_sort | Piotr Religa |
| collection | DOAJ |
| description | Abstract Engineered T-cell receptor (eTCR) systems rely on accurately generated T-cell receptor (TCR) sequences to enhance immunotherapy predictability and efficacy. The most variable and crucial part of the TCR receptor is the CDR3 sequence region. Current methods for generating CDR3 sequences, including motif-based and Markov models, struggle to generate reliable, diverse, and novel TCR sequences. In this study, we present the first application of Generative Adversarial Networks (GANs) for producing biologically reliable CDR3 sequences, using Long Short-Term Memory (LSTM) networks and LeakyReLU-based GANs. Our results show that LSTM models generate more diverse sequences with higher accuracy, lower discriminator loss, and higher AUC compared to LeakyReLU. However, LeakyReLU provides greater stability with a lower generator loss, achieving a total Pearson correlation score of over 0.9. Both models demonstrate the ability to produce highly realistic TCR sequences, as validated by t-SNE clustering, frequency distribution analysis, TCRd3 BLAST analysis, and in silico docking. These findings highlight the potential of GANs as a powerful tool for generating synthetic yet biologically relevant TCR sequences, a crucial step toward improving eTCR-based therapies. Further refinement of amino acid frequency distributions and clinical validation will enhance their applicability for therapeutic purposes. |
| format | Article |
| id | doaj-art-2810ceccc32c4b43be98db87302f075f |
| institution | DOAJ |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-2810ceccc32c4b43be98db87302f075f2025-08-20T03:22:12ZengNature PortfolioScientific Reports2045-23222025-05-0115111310.1038/s41598-025-01172-2High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networksPiotr Religa0Michel-Edwar Mickael1Norwin Kubick2Jarosław Olav Horbańczuk3Nikko Floretes4Mariusz Sacharczuk5Atanas G. Atanasov6Department of Medicine, Karolinska InstituteInstitute of Genetics and Animal Biotechnology, Polish Academy of SciencesDepartment of Biology, Institute of Plant Science and Microbiology, University of HamburgDepartment of Medicine, Karolinska InstituteCollege of Engineering, Samar State UniversityDepartment of Pharmacodynamics, Faculty of Pharmacy, Medical University of WarsawInstitute of Genetics and Animal Biotechnology, Polish Academy of SciencesAbstract Engineered T-cell receptor (eTCR) systems rely on accurately generated T-cell receptor (TCR) sequences to enhance immunotherapy predictability and efficacy. The most variable and crucial part of the TCR receptor is the CDR3 sequence region. Current methods for generating CDR3 sequences, including motif-based and Markov models, struggle to generate reliable, diverse, and novel TCR sequences. In this study, we present the first application of Generative Adversarial Networks (GANs) for producing biologically reliable CDR3 sequences, using Long Short-Term Memory (LSTM) networks and LeakyReLU-based GANs. Our results show that LSTM models generate more diverse sequences with higher accuracy, lower discriminator loss, and higher AUC compared to LeakyReLU. However, LeakyReLU provides greater stability with a lower generator loss, achieving a total Pearson correlation score of over 0.9. Both models demonstrate the ability to produce highly realistic TCR sequences, as validated by t-SNE clustering, frequency distribution analysis, TCRd3 BLAST analysis, and in silico docking. These findings highlight the potential of GANs as a powerful tool for generating synthetic yet biologically relevant TCR sequences, a crucial step toward improving eTCR-based therapies. Further refinement of amino acid frequency distributions and clinical validation will enhance their applicability for therapeutic purposes.https://doi.org/10.1038/s41598-025-01172-2 |
| spellingShingle | Piotr Religa Michel-Edwar Mickael Norwin Kubick Jarosław Olav Horbańczuk Nikko Floretes Mariusz Sacharczuk Atanas G. Atanasov High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networks Scientific Reports |
| title | High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networks |
| title_full | High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networks |
| title_fullStr | High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networks |
| title_full_unstemmed | High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networks |
| title_short | High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networks |
| title_sort | high fidelity in silico generation and augmentation of tcr repertoire data using generative adversarial networks |
| url | https://doi.org/10.1038/s41598-025-01172-2 |
| work_keys_str_mv | AT piotrreliga highfidelityinsilicogenerationandaugmentationoftcrrepertoiredatausinggenerativeadversarialnetworks AT micheledwarmickael highfidelityinsilicogenerationandaugmentationoftcrrepertoiredatausinggenerativeadversarialnetworks AT norwinkubick highfidelityinsilicogenerationandaugmentationoftcrrepertoiredatausinggenerativeadversarialnetworks AT jarosławolavhorbanczuk highfidelityinsilicogenerationandaugmentationoftcrrepertoiredatausinggenerativeadversarialnetworks AT nikkofloretes highfidelityinsilicogenerationandaugmentationoftcrrepertoiredatausinggenerativeadversarialnetworks AT mariuszsacharczuk highfidelityinsilicogenerationandaugmentationoftcrrepertoiredatausinggenerativeadversarialnetworks AT atanasgatanasov highfidelityinsilicogenerationandaugmentationoftcrrepertoiredatausinggenerativeadversarialnetworks |