ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction
Abstract The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates in vitro, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-cons...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | Journal of Cheminformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13321-025-00947-z |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841544356004102144 |
---|---|
author | Dong Wang Jieyu Jin Guqin Shi Jingxiao Bao Zheng Wang Shimeng Li Peichen Pan Dan Li Yu Kang Tingjun Hou |
author_facet | Dong Wang Jieyu Jin Guqin Shi Jingxiao Bao Zheng Wang Shimeng Li Peichen Pan Dan Li Yu Kang Tingjun Hou |
author_sort | Dong Wang |
collection | DOAJ |
description | Abstract The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates in vitro, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial for enhancing the efficiency of oral drug development. In this study, we conducted an in-depth analysis of the characteristics of an augmented Caco-2 permeability dataset, and evaluated a diverse range of machine learning algorithms in combination with different molecular representations. The results indicated that XGBoost generally provided better predictions than comparable models for the test sets. In addition, we investigated the transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets. Our findings, based on the Shanghai Qilu’s in-house dataset, showed that the boosting models retained a degree of predictive efficacy when applied to industry data. Furthermore, Y-randomization test and applicability domain analysis were employed to assess the robustness and generalizability of these models. Matched Molecular Pair Analysis (MMPA) was utilized to extract chemical transformation rules. We believe that the model developed in this study could represent a reliable tool for assessing Caco-2 permeability during early-stage drug discovery and the chemical transformation rules derived here could provide insights for optimizing Caco-2 permeability. Scientific contribution A comprehensive validation of various machine learning algorithms combined with diverse molecular representations on a large dataset for predicting Caco-2 permeability was reported. The transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets was also investigated. Matched molecular pair analysis was carried out to provide reasonable suggestions for researchers to improve the Caco-2 permeability of compounds. Graphical Abstract |
format | Article |
id | doaj-art-52addb2516a84aeab780b3373d74aa1b |
institution | Kabale University |
issn | 1758-2946 |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | Journal of Cheminformatics |
spelling | doaj-art-52addb2516a84aeab780b3373d74aa1b2025-01-12T12:37:25ZengBMCJournal of Cheminformatics1758-29462025-01-0117111410.1186/s13321-025-00947-zADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability predictionDong Wang0Jieyu Jin1Guqin Shi2Jingxiao Bao3Zheng Wang4Shimeng Li5Peichen Pan6Dan Li7Yu Kang8Tingjun Hou9Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityShanghai Qilu Pharmaceutical R&D CenterShanghai Qilu Pharmaceutical R&D CenterShanghai Qilu Pharmaceutical R&D CenterInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityAbstract The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates in vitro, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial for enhancing the efficiency of oral drug development. In this study, we conducted an in-depth analysis of the characteristics of an augmented Caco-2 permeability dataset, and evaluated a diverse range of machine learning algorithms in combination with different molecular representations. The results indicated that XGBoost generally provided better predictions than comparable models for the test sets. In addition, we investigated the transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets. Our findings, based on the Shanghai Qilu’s in-house dataset, showed that the boosting models retained a degree of predictive efficacy when applied to industry data. Furthermore, Y-randomization test and applicability domain analysis were employed to assess the robustness and generalizability of these models. Matched Molecular Pair Analysis (MMPA) was utilized to extract chemical transformation rules. We believe that the model developed in this study could represent a reliable tool for assessing Caco-2 permeability during early-stage drug discovery and the chemical transformation rules derived here could provide insights for optimizing Caco-2 permeability. Scientific contribution A comprehensive validation of various machine learning algorithms combined with diverse molecular representations on a large dataset for predicting Caco-2 permeability was reported. The transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets was also investigated. Matched molecular pair analysis was carried out to provide reasonable suggestions for researchers to improve the Caco-2 permeability of compounds. Graphical Abstracthttps://doi.org/10.1186/s13321-025-00947-zCaco-2 permeabilityMachine learningMatched molecular pair |
spellingShingle | Dong Wang Jieyu Jin Guqin Shi Jingxiao Bao Zheng Wang Shimeng Li Peichen Pan Dan Li Yu Kang Tingjun Hou ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction Journal of Cheminformatics Caco-2 permeability Machine learning Matched molecular pair |
title | ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction |
title_full | ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction |
title_fullStr | ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction |
title_full_unstemmed | ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction |
title_short | ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction |
title_sort | admet evaluation in drug discovery 21 application and industrial validation of machine learning algorithms for caco 2 permeability prediction |
topic | Caco-2 permeability Machine learning Matched molecular pair |
url | https://doi.org/10.1186/s13321-025-00947-z |
work_keys_str_mv | AT dongwang admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction AT jieyujin admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction AT guqinshi admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction AT jingxiaobao admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction AT zhengwang admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction AT shimengli admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction AT peichenpan admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction AT danli admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction AT yukang admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction AT tingjunhou admetevaluationindrugdiscovery21applicationandindustrialvalidationofmachinelearningalgorithmsforcaco2permeabilityprediction |