Improving generative inverse design of molecular catalysts in small data regime

Deep generative models are a powerful tool for exploring the chemical space within inverse-design workflows; however, their effectiveness relies on sufficient training data and effective mechanisms for guiding the model to optimize specific properties. We demonstrate that designing an expert-informe...

Full description

Saved in:
Bibliographic Details
Main Authors: François Cornet, Pratham Deshmukh, Bardi Benediktsson, Mikkel N Schmidt, Arghya Bhowmik
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Machine Learning: Science and Technology
Subjects:
Online Access:https://doi.org/10.1088/2632-2153/addc32
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849330619744518144
author François Cornet
Pratham Deshmukh
Bardi Benediktsson
Mikkel N Schmidt
Arghya Bhowmik
author_facet François Cornet
Pratham Deshmukh
Bardi Benediktsson
Mikkel N Schmidt
Arghya Bhowmik
author_sort François Cornet
collection DOAJ
description Deep generative models are a powerful tool for exploring the chemical space within inverse-design workflows; however, their effectiveness relies on sufficient training data and effective mechanisms for guiding the model to optimize specific properties. We demonstrate that designing an expert-informed data representation and training procedure allows leveraging data augmentation while maintaining the required sampling controllability. We focus our discussion on a specific class of compounds (transition metal complexes), and a popular class of generative models (equivariant diffusion models), although we envision that the approach could be extended to other chemical spaces and model types. Through experiments, we demonstrate that augmenting the training database with generic but related unlabeled data enables a practical level of performance to be reached.
format Article
id doaj-art-efdfe27d972344888bf30c366a8b16ef
institution Kabale University
issn 2632-2153
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series Machine Learning: Science and Technology
spelling doaj-art-efdfe27d972344888bf30c366a8b16ef2025-08-20T03:46:50ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016202505710.1088/2632-2153/addc32Improving generative inverse design of molecular catalysts in small data regimeFrançois Cornet0https://orcid.org/0009-0008-6157-862XPratham Deshmukh1https://orcid.org/0009-0004-1719-7310Bardi Benediktsson2https://orcid.org/0000-0002-1578-9126Mikkel N Schmidt3https://orcid.org/0000-0001-6927-8869Arghya Bhowmik4https://orcid.org/0000-0003-3198-5116Department of Energy Conversion and Storage, Technical University of Denmark , Kgs. Lyngby, Denmark; Department of Applied Mathematics and Computer Science, Technical University of Denmark , Kgs. Lyngby, DenmarkDepartment of Energy Conversion and Storage, Technical University of Denmark , Kgs. Lyngby, DenmarkDepartment of Energy Conversion and Storage, Technical University of Denmark , Kgs. Lyngby, DenmarkDepartment of Applied Mathematics and Computer Science, Technical University of Denmark , Kgs. Lyngby, DenmarkDepartment of Energy Conversion and Storage, Technical University of Denmark , Kgs. Lyngby, DenmarkDeep generative models are a powerful tool for exploring the chemical space within inverse-design workflows; however, their effectiveness relies on sufficient training data and effective mechanisms for guiding the model to optimize specific properties. We demonstrate that designing an expert-informed data representation and training procedure allows leveraging data augmentation while maintaining the required sampling controllability. We focus our discussion on a specific class of compounds (transition metal complexes), and a popular class of generative models (equivariant diffusion models), although we envision that the approach could be extended to other chemical spaces and model types. Through experiments, we demonstrate that augmenting the training database with generic but related unlabeled data enables a practical level of performance to be reached.https://doi.org/10.1088/2632-2153/addc32inverse designmulti-task learningdata representationtransition metal complexdiffusion modelcatalyst design
spellingShingle François Cornet
Pratham Deshmukh
Bardi Benediktsson
Mikkel N Schmidt
Arghya Bhowmik
Improving generative inverse design of molecular catalysts in small data regime
Machine Learning: Science and Technology
inverse design
multi-task learning
data representation
transition metal complex
diffusion model
catalyst design
title Improving generative inverse design of molecular catalysts in small data regime
title_full Improving generative inverse design of molecular catalysts in small data regime
title_fullStr Improving generative inverse design of molecular catalysts in small data regime
title_full_unstemmed Improving generative inverse design of molecular catalysts in small data regime
title_short Improving generative inverse design of molecular catalysts in small data regime
title_sort improving generative inverse design of molecular catalysts in small data regime
topic inverse design
multi-task learning
data representation
transition metal complex
diffusion model
catalyst design
url https://doi.org/10.1088/2632-2153/addc32
work_keys_str_mv AT francoiscornet improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime
AT prathamdeshmukh improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime
AT bardibenediktsson improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime
AT mikkelnschmidt improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime
AT arghyabhowmik improvinggenerativeinversedesignofmolecularcatalystsinsmalldataregime