Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey

The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction (MPP...

Full description

Saved in:
Bibliographic Details
Main Authors: Taojie Kuang, Pengfei Liu, Zhixiang Ren
Format: Article
Language:English
Published: Tsinghua University Press 2024-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2024.9020028
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572871744946176
author Taojie Kuang
Pengfei Liu
Zhixiang Ren
author_facet Taojie Kuang
Pengfei Liu
Zhixiang Ren
author_sort Taojie Kuang
collection DOAJ
description The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information significantly improves Molecular Property Prediction (MPP) for both regression and classification tasks. Specifically, regression improvements, measured by reductions in Root Mean Square Error (RMSE), are up to 4.0%, while classification enhancements, measured by the area under the receiver operating characteristic curve (ROC-AUC), are up to 1.7%. Additionally, we discover that, as measured by ROC-AUC, augmenting 2D graphs with 3D information improves performance for classification tasks by up to 13.2% and enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%. The two consolidated insights offer crucial guidance for future advancements in drug discovery.
format Article
id doaj-art-4870300b0bb944759a1a547071cce1b1
institution Kabale University
issn 2096-0654
language English
publishDate 2024-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-4870300b0bb944759a1a547071cce1b12025-02-02T06:29:08ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-09-017385888810.26599/BDMA.2024.9020028Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic SurveyTaojie Kuang0Pengfei Liu1Zhixiang Ren2Peng Cheng National Laboratory, Shenzhen 518000, China, and also with School of Future Technology, South China University of Technology, Guangzhou 511442, ChinaPeng Cheng National Laboratory, Shenzhen 518000, China, and also with School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, ChinaPeng Cheng National Laboratory, Shenzhen 518000, ChinaThe precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information significantly improves Molecular Property Prediction (MPP) for both regression and classification tasks. Specifically, regression improvements, measured by reductions in Root Mean Square Error (RMSE), are up to 4.0%, while classification enhancements, measured by the area under the receiver operating characteristic curve (ROC-AUC), are up to 1.7%. Additionally, we discover that, as measured by ROC-AUC, augmenting 2D graphs with 3D information improves performance for classification tasks by up to 13.2% and enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%. The two consolidated insights offer crucial guidance for future advancements in drug discovery.https://www.sciopen.com/article/10.26599/BDMA.2024.9020028molecular property prediction (mpp)deep learning (dl)domain knowledgemulti-modalitydrug discovery
spellingShingle Taojie Kuang
Pengfei Liu
Zhixiang Ren
Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey
Big Data Mining and Analytics
molecular property prediction (mpp)
deep learning (dl)
domain knowledge
multi-modality
drug discovery
title Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey
title_full Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey
title_fullStr Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey
title_full_unstemmed Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey
title_short Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey
title_sort impact of domain knowledge and multi modality on intelligent molecular property prediction a systematic survey
topic molecular property prediction (mpp)
deep learning (dl)
domain knowledge
multi-modality
drug discovery
url https://www.sciopen.com/article/10.26599/BDMA.2024.9020028
work_keys_str_mv AT taojiekuang impactofdomainknowledgeandmultimodalityonintelligentmolecularpropertypredictionasystematicsurvey
AT pengfeiliu impactofdomainknowledgeandmultimodalityonintelligentmolecularpropertypredictionasystematicsurvey
AT zhixiangren impactofdomainknowledgeandmultimodalityonintelligentmolecularpropertypredictionasystematicsurvey