Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference

Abstract Inference of molecules with desired activities/properties is one of the key and challenging issues in cheminformatics and bioinformatics. For that purpose, our research group has recently developed a state-of-the-art framework mol-infer for molecular inference. This framework first construc...

Full description

Saved in:
Bibliographic Details
Main Authors: Bowen Song, Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu
Format: Article
Language:English
Published: BMC 2025-08-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-01042-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849735953130717184
author Bowen Song
Jianshen Zhu
Naveed Ahmed Azam
Kazuya Haraguchi
Liang Zhao
Tatsuya Akutsu
author_facet Bowen Song
Jianshen Zhu
Naveed Ahmed Azam
Kazuya Haraguchi
Liang Zhao
Tatsuya Akutsu
author_sort Bowen Song
collection DOAJ
description Abstract Inference of molecules with desired activities/properties is one of the key and challenging issues in cheminformatics and bioinformatics. For that purpose, our research group has recently developed a state-of-the-art framework mol-infer for molecular inference. This framework first constructs a prediction function for a fixed property using machine learning models, which is then simulated by mixed-integer linear programming to infer desired molecules. The accuracy of the framework heavily relies on the representation power of the descriptors. In this study, we highlight a typical class of non-isomorphic chemical graphs with reasonably different property values that cannot be distinguished by the standard “two-layered (2L) model" of mol-infer. To address this distinguishability problem of the 2L model, we propose a novel family of descriptors, named cycle-configuration (CC), which captures the notion of ortho/meta/para patterns that appear in aromatic rings, which was impossible in the framework so far. Extensive computational experiments show that with the new descriptors, we can construct prediction functions with similar or better performance for all 44 tested chemical properties, including 27 regression datasets and 17 classification datasets comparing with our previous studies, confirming the effectiveness of the CC descriptors. For inference, we also provide a system of linear constraints to formulate the CC descriptors as linear constraints. We demonstrate that a chemical graph with up to 50 non-hydrogen vertices can be inferred within a practical time frame.
format Article
id doaj-art-49636484070b4e34bb1e770e677f389f
institution DOAJ
issn 1758-2946
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-49636484070b4e34bb1e770e677f389f2025-08-20T03:07:24ZengBMCJournal of Cheminformatics1758-29462025-08-0117112710.1186/s13321-025-01042-zCycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inferenceBowen Song0Jianshen Zhu1Naveed Ahmed Azam2Kazuya Haraguchi3Liang Zhao4Tatsuya Akutsu5Graduate School of Informatics, Kyoto UniversityGraduate School of Informatics, Kyoto UniversityDepartment of Mathematics, Quaid-i-Azam UniversityGraduate School of Informatics, Kyoto UniversityGraduate School of Advanced Integrated Studies in Human Survivability, Kyoto UniversityBioinformatics Center, Institute for Chemical Research, Kyoto UniversityAbstract Inference of molecules with desired activities/properties is one of the key and challenging issues in cheminformatics and bioinformatics. For that purpose, our research group has recently developed a state-of-the-art framework mol-infer for molecular inference. This framework first constructs a prediction function for a fixed property using machine learning models, which is then simulated by mixed-integer linear programming to infer desired molecules. The accuracy of the framework heavily relies on the representation power of the descriptors. In this study, we highlight a typical class of non-isomorphic chemical graphs with reasonably different property values that cannot be distinguished by the standard “two-layered (2L) model" of mol-infer. To address this distinguishability problem of the 2L model, we propose a novel family of descriptors, named cycle-configuration (CC), which captures the notion of ortho/meta/para patterns that appear in aromatic rings, which was impossible in the framework so far. Extensive computational experiments show that with the new descriptors, we can construct prediction functions with similar or better performance for all 44 tested chemical properties, including 27 regression datasets and 17 classification datasets comparing with our previous studies, confirming the effectiveness of the CC descriptors. For inference, we also provide a system of linear constraints to formulate the CC descriptors as linear constraints. We demonstrate that a chemical graph with up to 50 non-hydrogen vertices can be inferred within a practical time frame.https://doi.org/10.1186/s13321-025-01042-zInverse QSAR/QSPRMolecular inferenceDescriptor designMixed integer linear programmingMachine learning
spellingShingle Bowen Song
Jianshen Zhu
Naveed Ahmed Azam
Kazuya Haraguchi
Liang Zhao
Tatsuya Akutsu
Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference
Journal of Cheminformatics
Inverse QSAR/QSPR
Molecular inference
Descriptor design
Mixed integer linear programming
Machine learning
title Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference
title_full Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference
title_fullStr Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference
title_full_unstemmed Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference
title_short Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference
title_sort cycle configuration descriptors a novel graph theoretic approach to enhancing molecular inference
topic Inverse QSAR/QSPR
Molecular inference
Descriptor design
Mixed integer linear programming
Machine learning
url https://doi.org/10.1186/s13321-025-01042-z
work_keys_str_mv AT bowensong cycleconfigurationdescriptorsanovelgraphtheoreticapproachtoenhancingmolecularinference
AT jianshenzhu cycleconfigurationdescriptorsanovelgraphtheoreticapproachtoenhancingmolecularinference
AT naveedahmedazam cycleconfigurationdescriptorsanovelgraphtheoreticapproachtoenhancingmolecularinference
AT kazuyaharaguchi cycleconfigurationdescriptorsanovelgraphtheoreticapproachtoenhancingmolecularinference
AT liangzhao cycleconfigurationdescriptorsanovelgraphtheoreticapproachtoenhancingmolecularinference
AT tatsuyaakutsu cycleconfigurationdescriptorsanovelgraphtheoreticapproachtoenhancingmolecularinference