Large language models design sequence-defined macromolecules via evolutionary optimization

Abstract We demonstrate the ability of a large language model to perform evolutionary optimization for materials discovery. Anthropic’s Claude 3.5 model outperforms an active learning scheme with handcrafted surrogate models and an evolutionary algorithm in selecting monomer sequences to produce tar...

Full description

Saved in:
Bibliographic Details
Main Authors: Wesley F. Reinhart, Antonia Statt
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:npj Computational Materials
Online Access:https://doi.org/10.1038/s41524-024-01449-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849221094600343552
author Wesley F. Reinhart
Antonia Statt
author_facet Wesley F. Reinhart
Antonia Statt
author_sort Wesley F. Reinhart
collection DOAJ
description Abstract We demonstrate the ability of a large language model to perform evolutionary optimization for materials discovery. Anthropic’s Claude 3.5 model outperforms an active learning scheme with handcrafted surrogate models and an evolutionary algorithm in selecting monomer sequences to produce targeted morphologies in macromolecular self-assembly. Utilizing pre-trained language models can potentially reduce the need for hyperparameter tuning while offering new capabilities such as self-reflection. The model performs this task effectively with or without context about the task itself, but domain-specific context sometimes results in faster convergence to good solutions. Furthermore, when this context is withheld, the model infers an approximate notion of the task (e.g., calling it a protein folding problem). This work provides evidence of Claude 3.5’s ability to act as an evolutionary optimizer, a recently discovered emergent behavior of large language models, and demonstrates a practical use case in the study and design of soft materials.
format Article
id doaj-art-959e902afa294e5fb418b545b30ec3ed
institution Kabale University
issn 2057-3960
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series npj Computational Materials
spelling doaj-art-959e902afa294e5fb418b545b30ec3ed2024-11-24T12:35:35ZengNature Portfolionpj Computational Materials2057-39602024-11-011011810.1038/s41524-024-01449-6Large language models design sequence-defined macromolecules via evolutionary optimizationWesley F. Reinhart0Antonia Statt1Department of Materials Science and Engineering, Pennsylvania State UniversityDepartment of Materials Science and Engineering, Grainger College of Engineering, University of Illinois Urbana-ChampaignAbstract We demonstrate the ability of a large language model to perform evolutionary optimization for materials discovery. Anthropic’s Claude 3.5 model outperforms an active learning scheme with handcrafted surrogate models and an evolutionary algorithm in selecting monomer sequences to produce targeted morphologies in macromolecular self-assembly. Utilizing pre-trained language models can potentially reduce the need for hyperparameter tuning while offering new capabilities such as self-reflection. The model performs this task effectively with or without context about the task itself, but domain-specific context sometimes results in faster convergence to good solutions. Furthermore, when this context is withheld, the model infers an approximate notion of the task (e.g., calling it a protein folding problem). This work provides evidence of Claude 3.5’s ability to act as an evolutionary optimizer, a recently discovered emergent behavior of large language models, and demonstrates a practical use case in the study and design of soft materials.https://doi.org/10.1038/s41524-024-01449-6
spellingShingle Wesley F. Reinhart
Antonia Statt
Large language models design sequence-defined macromolecules via evolutionary optimization
npj Computational Materials
title Large language models design sequence-defined macromolecules via evolutionary optimization
title_full Large language models design sequence-defined macromolecules via evolutionary optimization
title_fullStr Large language models design sequence-defined macromolecules via evolutionary optimization
title_full_unstemmed Large language models design sequence-defined macromolecules via evolutionary optimization
title_short Large language models design sequence-defined macromolecules via evolutionary optimization
title_sort large language models design sequence defined macromolecules via evolutionary optimization
url https://doi.org/10.1038/s41524-024-01449-6
work_keys_str_mv AT wesleyfreinhart largelanguagemodelsdesignsequencedefinedmacromoleculesviaevolutionaryoptimization
AT antoniastatt largelanguagemodelsdesignsequencedefinedmacromoleculesviaevolutionaryoptimization