Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models

Abstract The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conform...

Full description

Saved in:
Bibliographic Details
Main Authors: Cecile Valsecchi, Jose A. Arjona-Medina, Natalia Dyubankova, Ramil Nugmanov
Format: Article
Language:English
Published: BMC 2025-05-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-01004-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849730860951011328
author Cecile Valsecchi
Jose A. Arjona-Medina
Natalia Dyubankova
Ramil Nugmanov
author_facet Cecile Valsecchi
Jose A. Arjona-Medina
Natalia Dyubankova
Ramil Nugmanov
author_sort Cecile Valsecchi
collection DOAJ
description Abstract The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conformer-based augmentation. Graph Neural Networks (GNNs) and Graph-based Transformer models (GTs) represent two paradigms in this field, with GT models emerging as a flexible alternative. In this study, we compare the performance of GT models against GNN models on three datasets. We explore the impact of training procedures, including context-enriched training through pretraining on quantum mechanical atomic-level properties and auxiliary task training. Our analysis focuses on sterimol parameters estimation, binding energy estimation, and generalization performance for transition metal complexes. We find that GT models with context-enriched training provide on par results compared to GNN models, with the added advantages of speed and flexibility. Our findings highlight the potential of GT models as a valid alternative for molecular representation learning tasks.
format Article
id doaj-art-dbb41081feaf49c8bbed9ab955666e93
institution DOAJ
issn 1758-2946
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-dbb41081feaf49c8bbed9ab955666e932025-08-20T03:08:44ZengBMCJournal of Cheminformatics1758-29462025-05-0117111510.1186/s13321-025-01004-5Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN modelsCecile Valsecchi0Jose A. Arjona-Medina1Natalia Dyubankova2Ramil Nugmanov3Discovery, Product Development and Supply, Janssen Cilag S.p.a.Discovery, Product Development and Supply, Janssen Cilag S.p.a.Discovery, Product Development and Supply, Janssen Pharmaceutica N.V.Discovery, Product Development and Supply, Janssen Pharmaceutica N.V.Abstract The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conformer-based augmentation. Graph Neural Networks (GNNs) and Graph-based Transformer models (GTs) represent two paradigms in this field, with GT models emerging as a flexible alternative. In this study, we compare the performance of GT models against GNN models on three datasets. We explore the impact of training procedures, including context-enriched training through pretraining on quantum mechanical atomic-level properties and auxiliary task training. Our analysis focuses on sterimol parameters estimation, binding energy estimation, and generalization performance for transition metal complexes. We find that GT models with context-enriched training provide on par results compared to GNN models, with the added advantages of speed and flexibility. Our findings highlight the potential of GT models as a valid alternative for molecular representation learning tasks.https://doi.org/10.1186/s13321-025-01004-5Drug discoveryQSARCheminformaticsGraphTransformersDeep learning
spellingShingle Cecile Valsecchi
Jose A. Arjona-Medina
Natalia Dyubankova
Ramil Nugmanov
Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models
Journal of Cheminformatics
Drug discovery
QSAR
Cheminformatics
Graph
Transformers
Deep learning
title Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models
title_full Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models
title_fullStr Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models
title_full_unstemmed Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models
title_short Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models
title_sort benchmarking molecular conformer augmentation with context enriched training graph based transformer versus gnn models
topic Drug discovery
QSAR
Cheminformatics
Graph
Transformers
Deep learning
url https://doi.org/10.1186/s13321-025-01004-5
work_keys_str_mv AT cecilevalsecchi benchmarkingmolecularconformeraugmentationwithcontextenrichedtraininggraphbasedtransformerversusgnnmodels
AT joseaarjonamedina benchmarkingmolecularconformeraugmentationwithcontextenrichedtraininggraphbasedtransformerversusgnnmodels
AT nataliadyubankova benchmarkingmolecularconformeraugmentationwithcontextenrichedtraininggraphbasedtransformerversusgnnmodels
AT ramilnugmanov benchmarkingmolecularconformeraugmentationwithcontextenrichedtraininggraphbasedtransformerversusgnnmodels