Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models

Abstract The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conform...

Full description

Saved in:
Bibliographic Details
Main Authors: Cecile Valsecchi, Jose A. Arjona-Medina, Natalia Dyubankova, Ramil Nugmanov
Format: Article
Language:English
Published: BMC 2025-05-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-01004-5
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conformer-based augmentation. Graph Neural Networks (GNNs) and Graph-based Transformer models (GTs) represent two paradigms in this field, with GT models emerging as a flexible alternative. In this study, we compare the performance of GT models against GNN models on three datasets. We explore the impact of training procedures, including context-enriched training through pretraining on quantum mechanical atomic-level properties and auxiliary task training. Our analysis focuses on sterimol parameters estimation, binding energy estimation, and generalization performance for transition metal complexes. We find that GT models with context-enriched training provide on par results compared to GNN models, with the added advantages of speed and flexibility. Our findings highlight the potential of GT models as a valid alternative for molecular representation learning tasks.
ISSN:1758-2946