LatentDE: latent-based directed evolution for protein sequence design

Directed evolution (DE) has been the most effective method for protein engineering that optimizes biological functionalities through a resource-intensive process of screening or selecting among a vast range of mutations. To mitigate this extensive procedure, recent advancements in machine learning-g...

Full description

Saved in:
Bibliographic Details
Main Authors: Thanh V T Tran, Nhat Khang Ngo, Viet Thanh Duy Nguyen, Truong-Son Hy
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Machine Learning: Science and Technology
Subjects:
Online Access:https://doi.org/10.1088/2632-2153/adc2e2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849388023182000128
author Thanh V T Tran
Nhat Khang Ngo
Viet Thanh Duy Nguyen
Truong-Son Hy
author_facet Thanh V T Tran
Nhat Khang Ngo
Viet Thanh Duy Nguyen
Truong-Son Hy
author_sort Thanh V T Tran
collection DOAJ
description Directed evolution (DE) has been the most effective method for protein engineering that optimizes biological functionalities through a resource-intensive process of screening or selecting among a vast range of mutations. To mitigate this extensive procedure, recent advancements in machine learning-guided methodologies center around the establishment of a surrogate sequence-function model. In this paper, we propose latent-based DE (LDE), an evolutionary algorithm designed to prioritize the exploration of high-fitness mutants in the latent space. At its core, LDE is a regularized variational autoencoder (VAE), harnessing the capabilities of the state-of-the-art protein language model, ESM-2, to construct a meaningful latent space of sequences. From this encoded representation, we present a novel approach for efficient traversal on the fitness landscape, employing a combination of gradient-based methods and DE. Experimental evaluations conducted on eight protein sequence design tasks demonstrate the superior performance of our proposed LDE over previous baseline algorithms. Our implementation is publicly available at https://github.com/HySonLab/LatentDE .
format Article
id doaj-art-c75e584c82224ccb8e1ff8dd8380ed69
institution Kabale University
issn 2632-2153
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series Machine Learning: Science and Technology
spelling doaj-art-c75e584c82224ccb8e1ff8dd8380ed692025-08-20T03:42:25ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101507010.1088/2632-2153/adc2e2LatentDE: latent-based directed evolution for protein sequence designThanh V T Tran0https://orcid.org/0000-0001-8663-1652Nhat Khang Ngo1Viet Thanh Duy Nguyen2https://orcid.org/0009-0001-8319-3033Truong-Son Hy3https://orcid.org/0000-0002-5092-3757FPT Software AI Center , Hanoi, VietnamFPT Software AI Center , Hanoi, VietnamFPT Software AI Center , Hanoi, Vietnam; University of Alabama at Birmingham , Birmingham, AL 35294, United States of AmericaUniversity of Alabama at Birmingham , Birmingham, AL 35294, United States of AmericaDirected evolution (DE) has been the most effective method for protein engineering that optimizes biological functionalities through a resource-intensive process of screening or selecting among a vast range of mutations. To mitigate this extensive procedure, recent advancements in machine learning-guided methodologies center around the establishment of a surrogate sequence-function model. In this paper, we propose latent-based DE (LDE), an evolutionary algorithm designed to prioritize the exploration of high-fitness mutants in the latent space. At its core, LDE is a regularized variational autoencoder (VAE), harnessing the capabilities of the state-of-the-art protein language model, ESM-2, to construct a meaningful latent space of sequences. From this encoded representation, we present a novel approach for efficient traversal on the fitness landscape, employing a combination of gradient-based methods and DE. Experimental evaluations conducted on eight protein sequence design tasks demonstrate the superior performance of our proposed LDE over previous baseline algorithms. Our implementation is publicly available at https://github.com/HySonLab/LatentDE .https://doi.org/10.1088/2632-2153/adc2e2directed evolutionprotein representation learningprotein designlatent-based optimizationevolutionary algorithm
spellingShingle Thanh V T Tran
Nhat Khang Ngo
Viet Thanh Duy Nguyen
Truong-Son Hy
LatentDE: latent-based directed evolution for protein sequence design
Machine Learning: Science and Technology
directed evolution
protein representation learning
protein design
latent-based optimization
evolutionary algorithm
title LatentDE: latent-based directed evolution for protein sequence design
title_full LatentDE: latent-based directed evolution for protein sequence design
title_fullStr LatentDE: latent-based directed evolution for protein sequence design
title_full_unstemmed LatentDE: latent-based directed evolution for protein sequence design
title_short LatentDE: latent-based directed evolution for protein sequence design
title_sort latentde latent based directed evolution for protein sequence design
topic directed evolution
protein representation learning
protein design
latent-based optimization
evolutionary algorithm
url https://doi.org/10.1088/2632-2153/adc2e2
work_keys_str_mv AT thanhvttran latentdelatentbaseddirectedevolutionforproteinsequencedesign
AT nhatkhangngo latentdelatentbaseddirectedevolutionforproteinsequencedesign
AT vietthanhduynguyen latentdelatentbaseddirectedevolutionforproteinsequencedesign
AT truongsonhy latentdelatentbaseddirectedevolutionforproteinsequencedesign