VAE-Assisted Data Augmentation for Improved Molecular Prediction with Graph Neural Networks (GNNs) in Low-Data Regimes

This study presents a novel approach to enhancing molecular property prediction through variational autoencoder (VAE)-assisted data augmentation in low-data regimes. The methodology combines graph neural networks (GNNs) with VAEs to improve predictive accuracy on molecular datasets from MoleculeNet,...

Full description

Saved in:
Bibliographic Details
Main Authors: Gabriela C. Theis Marchan, Pegah Naghshnejad, Andrew Okafor, Jose A. Romagnoli
Format: Article
Language:English
Published: AIDIC Servizi S.r.l. 2025-07-01
Series:Chemical Engineering Transactions
Online Access:https://www.cetjournal.it/index.php/cet/article/view/15421
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study presents a novel approach to enhancing molecular property prediction through variational autoencoder (VAE)-assisted data augmentation in low-data regimes. The methodology combines graph neural networks (GNNs) with VAEs to improve predictive accuracy on molecular datasets from MoleculeNet, specifically ESOL (water solubility) and FreeSolv (hydration-free energy). By generating chemically valid molecules that align with the original dataset's chemical space, the approach enhances model performance, particularly for graph attention networks (GATs). Results show significant improvements in prediction accuracy, with GAT models demonstrating increased R² values from 0.879 to 0.918 for FreeSolv and 0.873 to 0.885 for ESOL when trained on augmented datasets. The study validates the effectiveness of VAE-generated molecules through chemical space analysis and property distribution comparisons, offering a promising solution for molecular property prediction in data-limited scenarios.
ISSN:2283-9216