Improving generative inverse design of molecular catalysts in small data regime
Deep generative models are a powerful tool for exploring the chemical space within inverse-design workflows; however, their effectiveness relies on sufficient training data and effective mechanisms for guiding the model to optimize specific properties. We demonstrate that designing an expert-informe...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IOP Publishing
2025-01-01
|
| Series: | Machine Learning: Science and Technology |
| Subjects: | |
| Online Access: | https://doi.org/10.1088/2632-2153/addc32 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849330619744518144 |
|---|---|
| author | François Cornet Pratham Deshmukh Bardi Benediktsson Mikkel N Schmidt Arghya Bhowmik |
| author_facet | François Cornet Pratham Deshmukh Bardi Benediktsson Mikkel N Schmidt Arghya Bhowmik |
| author_sort | François Cornet |
| collection | DOAJ |
| description | Deep generative models are a powerful tool for exploring the chemical space within inverse-design workflows; however, their effectiveness relies on sufficient training data and effective mechanisms for guiding the model to optimize specific properties. We demonstrate that designing an expert-informed data representation and training procedure allows leveraging data augmentation while maintaining the required sampling controllability. We focus our discussion on a specific class of compounds (transition metal complexes), and a popular class of generative models (equivariant diffusion models), although we envision that the approach could be extended to other chemical spaces and model types. Through experiments, we demonstrate that augmenting the training database with generic but related unlabeled data enables a practical level of performance to be reached. |
| format | Article |
| id | doaj-art-efdfe27d972344888bf30c366a8b16ef |
| institution | Kabale University |
| issn | 2632-2153 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IOP Publishing |
| record_format | Article |
| series | Machine Learning: Science and Technology |
| spelling | doaj-art-efdfe27d972344888bf30c366a8b16ef2025-08-20T03:46:50ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016202505710.1088/2632-2153/addc32Improving generative inverse design of molecular catalysts in small data regimeFrançois Cornet0https://orcid.org/0009-0008-6157-862XPratham Deshmukh1https://orcid.org/0009-0004-1719-7310Bardi Benediktsson2https://orcid.org/0000-0002-1578-9126Mikkel N Schmidt3https://orcid.org/0000-0001-6927-8869Arghya Bhowmik4https://orcid.org/0000-0003-3198-5116Department of Energy Conversion and Storage, Technical University of Denmark , Kgs. Lyngby, Denmark; Department of Applied Mathematics and Computer Science, Technical University of Denmark , Kgs. Lyngby, DenmarkDepartment of Energy Conversion and Storage, Technical University of Denmark , Kgs. Lyngby, DenmarkDepartment of Energy Conversion and Storage, Technical University of Denmark , Kgs. Lyngby, DenmarkDepartment of Applied Mathematics and Computer Science, Technical University of Denmark , Kgs. Lyngby, DenmarkDepartment of Energy Conversion and Storage, Technical University of Denmark , Kgs. Lyngby, DenmarkDeep generative models are a powerful tool for exploring the chemical space within inverse-design workflows; however, their effectiveness relies on sufficient training data and effective mechanisms for guiding the model to optimize specific properties. We demonstrate that designing an expert-informed data representation and training procedure allows leveraging data augmentation while maintaining the required sampling controllability. We focus our discussion on a specific class of compounds (transition metal complexes), and a popular class of generative models (equivariant diffusion models), although we envision that the approach could be extended to other chemical spaces and model types. Through experiments, we demonstrate that augmenting the training database with generic but related unlabeled data enables a practical level of performance to be reached.https://doi.org/10.1088/2632-2153/addc32inverse designmulti-task learningdata representationtransition metal complexdiffusion modelcatalyst design |
| spellingShingle | François Cornet Pratham Deshmukh Bardi Benediktsson Mikkel N Schmidt Arghya Bhowmik Improving generative inverse design of molecular catalysts in small data regime Machine Learning: Science and Technology inverse design multi-task learning data representation transition metal complex diffusion model catalyst design |
| title | Improving generative inverse design of molecular catalysts in small data regime |
| title_full | Improving generative inverse design of molecular catalysts in small data regime |
| title_fullStr | Improving generative inverse design of molecular catalysts in small data regime |
| title_full_unstemmed | Improving generative inverse design of molecular catalysts in small data regime |
| title_short | Improving generative inverse design of molecular catalysts in small data regime |
| title_sort | improving generative inverse design of molecular catalysts in small data regime |
| topic | inverse design multi-task learning data representation transition metal complex diffusion model catalyst design |
| url | https://doi.org/10.1088/2632-2153/addc32 |
| work_keys_str_mv | AT francoiscornet improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime AT prathamdeshmukh improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime AT bardibenediktsson improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime AT mikkelnschmidt improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime AT arghyabhowmik improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime |