LatentDE: latent-based directed evolution for protein sequence design
Directed evolution (DE) has been the most effective method for protein engineering that optimizes biological functionalities through a resource-intensive process of screening or selecting among a vast range of mutations. To mitigate this extensive procedure, recent advancements in machine learning-g...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IOP Publishing
2025-01-01
|
| Series: | Machine Learning: Science and Technology |
| Subjects: | |
| Online Access: | https://doi.org/10.1088/2632-2153/adc2e2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849388023182000128 |
|---|---|
| author | Thanh V T Tran Nhat Khang Ngo Viet Thanh Duy Nguyen Truong-Son Hy |
| author_facet | Thanh V T Tran Nhat Khang Ngo Viet Thanh Duy Nguyen Truong-Son Hy |
| author_sort | Thanh V T Tran |
| collection | DOAJ |
| description | Directed evolution (DE) has been the most effective method for protein engineering that optimizes biological functionalities through a resource-intensive process of screening or selecting among a vast range of mutations. To mitigate this extensive procedure, recent advancements in machine learning-guided methodologies center around the establishment of a surrogate sequence-function model. In this paper, we propose latent-based DE (LDE), an evolutionary algorithm designed to prioritize the exploration of high-fitness mutants in the latent space. At its core, LDE is a regularized variational autoencoder (VAE), harnessing the capabilities of the state-of-the-art protein language model, ESM-2, to construct a meaningful latent space of sequences. From this encoded representation, we present a novel approach for efficient traversal on the fitness landscape, employing a combination of gradient-based methods and DE. Experimental evaluations conducted on eight protein sequence design tasks demonstrate the superior performance of our proposed LDE over previous baseline algorithms. Our implementation is publicly available at https://github.com/HySonLab/LatentDE . |
| format | Article |
| id | doaj-art-c75e584c82224ccb8e1ff8dd8380ed69 |
| institution | Kabale University |
| issn | 2632-2153 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IOP Publishing |
| record_format | Article |
| series | Machine Learning: Science and Technology |
| spelling | doaj-art-c75e584c82224ccb8e1ff8dd8380ed692025-08-20T03:42:25ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101507010.1088/2632-2153/adc2e2LatentDE: latent-based directed evolution for protein sequence designThanh V T Tran0https://orcid.org/0000-0001-8663-1652Nhat Khang Ngo1Viet Thanh Duy Nguyen2https://orcid.org/0009-0001-8319-3033Truong-Son Hy3https://orcid.org/0000-0002-5092-3757FPT Software AI Center , Hanoi, VietnamFPT Software AI Center , Hanoi, VietnamFPT Software AI Center , Hanoi, Vietnam; University of Alabama at Birmingham , Birmingham, AL 35294, United States of AmericaUniversity of Alabama at Birmingham , Birmingham, AL 35294, United States of AmericaDirected evolution (DE) has been the most effective method for protein engineering that optimizes biological functionalities through a resource-intensive process of screening or selecting among a vast range of mutations. To mitigate this extensive procedure, recent advancements in machine learning-guided methodologies center around the establishment of a surrogate sequence-function model. In this paper, we propose latent-based DE (LDE), an evolutionary algorithm designed to prioritize the exploration of high-fitness mutants in the latent space. At its core, LDE is a regularized variational autoencoder (VAE), harnessing the capabilities of the state-of-the-art protein language model, ESM-2, to construct a meaningful latent space of sequences. From this encoded representation, we present a novel approach for efficient traversal on the fitness landscape, employing a combination of gradient-based methods and DE. Experimental evaluations conducted on eight protein sequence design tasks demonstrate the superior performance of our proposed LDE over previous baseline algorithms. Our implementation is publicly available at https://github.com/HySonLab/LatentDE .https://doi.org/10.1088/2632-2153/adc2e2directed evolutionprotein representation learningprotein designlatent-based optimizationevolutionary algorithm |
| spellingShingle | Thanh V T Tran Nhat Khang Ngo Viet Thanh Duy Nguyen Truong-Son Hy LatentDE: latent-based directed evolution for protein sequence design Machine Learning: Science and Technology directed evolution protein representation learning protein design latent-based optimization evolutionary algorithm |
| title | LatentDE: latent-based directed evolution for protein sequence design |
| title_full | LatentDE: latent-based directed evolution for protein sequence design |
| title_fullStr | LatentDE: latent-based directed evolution for protein sequence design |
| title_full_unstemmed | LatentDE: latent-based directed evolution for protein sequence design |
| title_short | LatentDE: latent-based directed evolution for protein sequence design |
| title_sort | latentde latent based directed evolution for protein sequence design |
| topic | directed evolution protein representation learning protein design latent-based optimization evolutionary algorithm |
| url | https://doi.org/10.1088/2632-2153/adc2e2 |
| work_keys_str_mv | AT thanhvttran latentdelatentbaseddirectedevolutionforproteinsequencedesign AT nhatkhangngo latentdelatentbaseddirectedevolutionforproteinsequencedesign AT vietthanhduynguyen latentdelatentbaseddirectedevolutionforproteinsequencedesign AT truongsonhy latentdelatentbaseddirectedevolutionforproteinsequencedesign |