Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation

Abstract Reliable molecular property prediction is essential for various scientific endeavors and industrial applications, such as drug discovery. However, the data scarcity, combined with the highly non-linear causal relationships between physicochemical and biological properties and conventional m...

Full description

Saved in:
Bibliographic Details
Main Authors: Yue Wan, Jialu Wu, Tingjun Hou, Chang-Yu Hsieh, Xiaowei Jia
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-55082-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850098999080517632
author Yue Wan
Jialu Wu
Tingjun Hou
Chang-Yu Hsieh
Xiaowei Jia
author_facet Yue Wan
Jialu Wu
Tingjun Hou
Chang-Yu Hsieh
Xiaowei Jia
author_sort Yue Wan
collection DOAJ
description Abstract Reliable molecular property prediction is essential for various scientific endeavors and industrial applications, such as drug discovery. However, the data scarcity, combined with the highly non-linear causal relationships between physicochemical and biological properties and conventional molecular featurization schemes, complicates the development of robust molecular machine learning models. Self-supervised learning (SSL) has emerged as a popular solution, utilizing large-scale, unannotated molecular data to learn a foundational representation of chemical space that might be advantageous for downstream tasks. Yet, existing molecular SSL methods largely overlook chemical knowledge, including molecular structure similarity, scaffold composition, and the context-dependent aspects of molecular properties when operating over the chemical space. They also struggle to learn the subtle variations in structure-activity relationship. This paper introduces a multi-channel pre-training framework that learns robust and generalizable chemical knowledge. It leverages the structural hierarchy within the molecule, embeds them through distinct pre-training tasks across channels, and aggregates channel information in a task-specific manner during fine-tuning. Our approach demonstrates competitive performance across various molecular property benchmarks and offers strong advantages in particularly challenging yet ubiquitous scenarios like activity cliffs.
format Article
id doaj-art-bda65690da1c4fabac046be4f4649a39
institution DOAJ
issn 2041-1723
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-bda65690da1c4fabac046be4f4649a392025-08-20T02:40:35ZengNature PortfolioNature Communications2041-17232025-01-0116111310.1038/s41467-024-55082-4Multi-channel learning for integrating structural hierarchies into context-dependent molecular representationYue Wan0Jialu Wu1Tingjun Hou2Chang-Yu Hsieh3Xiaowei Jia4University of Pittsburgh, Department of Computer ScienceInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityUniversity of Pittsburgh, Department of Computer ScienceAbstract Reliable molecular property prediction is essential for various scientific endeavors and industrial applications, such as drug discovery. However, the data scarcity, combined with the highly non-linear causal relationships between physicochemical and biological properties and conventional molecular featurization schemes, complicates the development of robust molecular machine learning models. Self-supervised learning (SSL) has emerged as a popular solution, utilizing large-scale, unannotated molecular data to learn a foundational representation of chemical space that might be advantageous for downstream tasks. Yet, existing molecular SSL methods largely overlook chemical knowledge, including molecular structure similarity, scaffold composition, and the context-dependent aspects of molecular properties when operating over the chemical space. They also struggle to learn the subtle variations in structure-activity relationship. This paper introduces a multi-channel pre-training framework that learns robust and generalizable chemical knowledge. It leverages the structural hierarchy within the molecule, embeds them through distinct pre-training tasks across channels, and aggregates channel information in a task-specific manner during fine-tuning. Our approach demonstrates competitive performance across various molecular property benchmarks and offers strong advantages in particularly challenging yet ubiquitous scenarios like activity cliffs.https://doi.org/10.1038/s41467-024-55082-4
spellingShingle Yue Wan
Jialu Wu
Tingjun Hou
Chang-Yu Hsieh
Xiaowei Jia
Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation
Nature Communications
title Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation
title_full Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation
title_fullStr Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation
title_full_unstemmed Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation
title_short Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation
title_sort multi channel learning for integrating structural hierarchies into context dependent molecular representation
url https://doi.org/10.1038/s41467-024-55082-4
work_keys_str_mv AT yuewan multichannellearningforintegratingstructuralhierarchiesintocontextdependentmolecularrepresentation
AT jialuwu multichannellearningforintegratingstructuralhierarchiesintocontextdependentmolecularrepresentation
AT tingjunhou multichannellearningforintegratingstructuralhierarchiesintocontextdependentmolecularrepresentation
AT changyuhsieh multichannellearningforintegratingstructuralhierarchiesintocontextdependentmolecularrepresentation
AT xiaoweijia multichannellearningforintegratingstructuralhierarchiesintocontextdependentmolecularrepresentation