Context-dependent similarity searching for small molecular fragments

Abstract Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties. For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships du...

Full description

Saved in:
Bibliographic Details
Main Authors: Atsushi Yoshimori, Jürgen Bajorath
Format: Article
Language:English
Published: BMC 2025-05-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-01032-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849687957820145664
author Atsushi Yoshimori
Jürgen Bajorath
author_facet Atsushi Yoshimori
Jürgen Bajorath
author_sort Atsushi Yoshimori
collection DOAJ
description Abstract Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties. For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness. As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account. Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks. With active analogue series as a model system to establish a global structure–activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations. Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents. Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context. For similarity searching, different structural or structure–property contexts can be established, providing opportunities for various applications.
format Article
id doaj-art-66edbf06b4d94cdc975e6c290371445d
institution DOAJ
issn 1758-2946
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-66edbf06b4d94cdc975e6c290371445d2025-08-20T03:22:11ZengBMCJournal of Cheminformatics1758-29462025-05-0117111110.1186/s13321-025-01032-1Context-dependent similarity searching for small molecular fragmentsAtsushi Yoshimori0Jürgen Bajorath1Institute for Theoretical Medicine, Inc.Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of BonnAbstract Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties. For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness. As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account. Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks. With active analogue series as a model system to establish a global structure–activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations. Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents. Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context. For similarity searching, different structural or structure–property contexts can be established, providing opportunities for various applications.https://doi.org/10.1186/s13321-025-01032-1Molecular similarityContext-dependent similaritySmall molecular fragmentsSubstituentsSimilarity-property principleChemical similarity searching
spellingShingle Atsushi Yoshimori
Jürgen Bajorath
Context-dependent similarity searching for small molecular fragments
Journal of Cheminformatics
Molecular similarity
Context-dependent similarity
Small molecular fragments
Substituents
Similarity-property principle
Chemical similarity searching
title Context-dependent similarity searching for small molecular fragments
title_full Context-dependent similarity searching for small molecular fragments
title_fullStr Context-dependent similarity searching for small molecular fragments
title_full_unstemmed Context-dependent similarity searching for small molecular fragments
title_short Context-dependent similarity searching for small molecular fragments
title_sort context dependent similarity searching for small molecular fragments
topic Molecular similarity
Context-dependent similarity
Small molecular fragments
Substituents
Similarity-property principle
Chemical similarity searching
url https://doi.org/10.1186/s13321-025-01032-1
work_keys_str_mv AT atsushiyoshimori contextdependentsimilaritysearchingforsmallmolecularfragments
AT jurgenbajorath contextdependentsimilaritysearchingforsmallmolecularfragments