Active learning-assisted directed evolution

Abstract Directed evolution (DE) is a powerful tool to optimize protein fitness for a specific application. However, DE can be inefficient when mutations exhibit non-additive, or epistatic, behavior. Here, we present Active Learning-assisted Directed Evolution (ALDE), an iterative machine learning-a...

Full description

Saved in:
Bibliographic Details
Main Authors: Jason Yang, Ravi G. Lal, James C. Bowden, Raul Astudillo, Mikhail A. Hameedi, Sukhvinder Kaur, Matthew Hill, Yisong Yue, Frances H. Arnold
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-55987-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594542937767936
author Jason Yang
Ravi G. Lal
James C. Bowden
Raul Astudillo
Mikhail A. Hameedi
Sukhvinder Kaur
Matthew Hill
Yisong Yue
Frances H. Arnold
author_facet Jason Yang
Ravi G. Lal
James C. Bowden
Raul Astudillo
Mikhail A. Hameedi
Sukhvinder Kaur
Matthew Hill
Yisong Yue
Frances H. Arnold
author_sort Jason Yang
collection DOAJ
description Abstract Directed evolution (DE) is a powerful tool to optimize protein fitness for a specific application. However, DE can be inefficient when mutations exhibit non-additive, or epistatic, behavior. Here, we present Active Learning-assisted Directed Evolution (ALDE), an iterative machine learning-assisted DE workflow that leverages uncertainty quantification to explore the search space of proteins more efficiently than current DE methods. We apply ALDE to an engineering landscape that is challenging for DE: optimization of five epistatic residues in the active site of an enzyme. In three rounds of wet-lab experimentation, we improve the yield of a desired product of a non-native cyclopropanation reaction from 12% to 93%. We also perform computational simulations on existing protein sequence-fitness datasets to support our argument that ALDE can be more effective than DE. Overall, ALDE is a practical and broadly applicable strategy to unlock improved protein engineering outcomes.
format Article
id doaj-art-42793e27879147558f1a8afcca84d70e
institution Kabale University
issn 2041-1723
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-42793e27879147558f1a8afcca84d70e2025-01-19T12:31:36ZengNature PortfolioNature Communications2041-17232025-01-0116111210.1038/s41467-025-55987-8Active learning-assisted directed evolutionJason Yang0Ravi G. Lal1James C. Bowden2Raul Astudillo3Mikhail A. Hameedi4Sukhvinder Kaur5Matthew Hill6Yisong Yue7Frances H. Arnold8Division of Chemistry and Chemical Engineering, California Institute of TechnologyDivision of Chemistry and Chemical Engineering, California Institute of TechnologyDivision of Engineering and Applied Sciences, California Institute of TechnologyDivision of Engineering and Applied Sciences, California Institute of TechnologyDivision of Biology and Biological Engineering, California Institute of TechnologyElegen CorpElegen CorpDivision of Engineering and Applied Sciences, California Institute of TechnologyDivision of Chemistry and Chemical Engineering, California Institute of TechnologyAbstract Directed evolution (DE) is a powerful tool to optimize protein fitness for a specific application. However, DE can be inefficient when mutations exhibit non-additive, or epistatic, behavior. Here, we present Active Learning-assisted Directed Evolution (ALDE), an iterative machine learning-assisted DE workflow that leverages uncertainty quantification to explore the search space of proteins more efficiently than current DE methods. We apply ALDE to an engineering landscape that is challenging for DE: optimization of five epistatic residues in the active site of an enzyme. In three rounds of wet-lab experimentation, we improve the yield of a desired product of a non-native cyclopropanation reaction from 12% to 93%. We also perform computational simulations on existing protein sequence-fitness datasets to support our argument that ALDE can be more effective than DE. Overall, ALDE is a practical and broadly applicable strategy to unlock improved protein engineering outcomes.https://doi.org/10.1038/s41467-025-55987-8
spellingShingle Jason Yang
Ravi G. Lal
James C. Bowden
Raul Astudillo
Mikhail A. Hameedi
Sukhvinder Kaur
Matthew Hill
Yisong Yue
Frances H. Arnold
Active learning-assisted directed evolution
Nature Communications
title Active learning-assisted directed evolution
title_full Active learning-assisted directed evolution
title_fullStr Active learning-assisted directed evolution
title_full_unstemmed Active learning-assisted directed evolution
title_short Active learning-assisted directed evolution
title_sort active learning assisted directed evolution
url https://doi.org/10.1038/s41467-025-55987-8
work_keys_str_mv AT jasonyang activelearningassisteddirectedevolution
AT raviglal activelearningassisteddirectedevolution
AT jamescbowden activelearningassisteddirectedevolution
AT raulastudillo activelearningassisteddirectedevolution
AT mikhailahameedi activelearningassisteddirectedevolution
AT sukhvinderkaur activelearningassisteddirectedevolution
AT matthewhill activelearningassisteddirectedevolution
AT yisongyue activelearningassisteddirectedevolution
AT francesharnold activelearningassisteddirectedevolution