Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models
Abstract The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conform...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-05-01
|
| Series: | Journal of Cheminformatics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13321-025-01004-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849730860951011328 |
|---|---|
| author | Cecile Valsecchi Jose A. Arjona-Medina Natalia Dyubankova Ramil Nugmanov |
| author_facet | Cecile Valsecchi Jose A. Arjona-Medina Natalia Dyubankova Ramil Nugmanov |
| author_sort | Cecile Valsecchi |
| collection | DOAJ |
| description | Abstract The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conformer-based augmentation. Graph Neural Networks (GNNs) and Graph-based Transformer models (GTs) represent two paradigms in this field, with GT models emerging as a flexible alternative. In this study, we compare the performance of GT models against GNN models on three datasets. We explore the impact of training procedures, including context-enriched training through pretraining on quantum mechanical atomic-level properties and auxiliary task training. Our analysis focuses on sterimol parameters estimation, binding energy estimation, and generalization performance for transition metal complexes. We find that GT models with context-enriched training provide on par results compared to GNN models, with the added advantages of speed and flexibility. Our findings highlight the potential of GT models as a valid alternative for molecular representation learning tasks. |
| format | Article |
| id | doaj-art-dbb41081feaf49c8bbed9ab955666e93 |
| institution | DOAJ |
| issn | 1758-2946 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | BMC |
| record_format | Article |
| series | Journal of Cheminformatics |
| spelling | doaj-art-dbb41081feaf49c8bbed9ab955666e932025-08-20T03:08:44ZengBMCJournal of Cheminformatics1758-29462025-05-0117111510.1186/s13321-025-01004-5Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN modelsCecile Valsecchi0Jose A. Arjona-Medina1Natalia Dyubankova2Ramil Nugmanov3Discovery, Product Development and Supply, Janssen Cilag S.p.a.Discovery, Product Development and Supply, Janssen Cilag S.p.a.Discovery, Product Development and Supply, Janssen Pharmaceutica N.V.Discovery, Product Development and Supply, Janssen Pharmaceutica N.V.Abstract The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conformer-based augmentation. Graph Neural Networks (GNNs) and Graph-based Transformer models (GTs) represent two paradigms in this field, with GT models emerging as a flexible alternative. In this study, we compare the performance of GT models against GNN models on three datasets. We explore the impact of training procedures, including context-enriched training through pretraining on quantum mechanical atomic-level properties and auxiliary task training. Our analysis focuses on sterimol parameters estimation, binding energy estimation, and generalization performance for transition metal complexes. We find that GT models with context-enriched training provide on par results compared to GNN models, with the added advantages of speed and flexibility. Our findings highlight the potential of GT models as a valid alternative for molecular representation learning tasks.https://doi.org/10.1186/s13321-025-01004-5Drug discoveryQSARCheminformaticsGraphTransformersDeep learning |
| spellingShingle | Cecile Valsecchi Jose A. Arjona-Medina Natalia Dyubankova Ramil Nugmanov Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models Journal of Cheminformatics Drug discovery QSAR Cheminformatics Graph Transformers Deep learning |
| title | Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models |
| title_full | Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models |
| title_fullStr | Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models |
| title_full_unstemmed | Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models |
| title_short | Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models |
| title_sort | benchmarking molecular conformer augmentation with context enriched training graph based transformer versus gnn models |
| topic | Drug discovery QSAR Cheminformatics Graph Transformers Deep learning |
| url | https://doi.org/10.1186/s13321-025-01004-5 |
| work_keys_str_mv | AT cecilevalsecchi benchmarkingmolecularconformeraugmentationwithcontextenrichedtraininggraphbasedtransformerversusgnnmodels AT joseaarjonamedina benchmarkingmolecularconformeraugmentationwithcontextenrichedtraininggraphbasedtransformerversusgnnmodels AT nataliadyubankova benchmarkingmolecularconformeraugmentationwithcontextenrichedtraininggraphbasedtransformerversusgnnmodels AT ramilnugmanov benchmarkingmolecularconformeraugmentationwithcontextenrichedtraininggraphbasedtransformerversusgnnmodels |