Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection Approach

Predicting the toxicity of drug molecules using in silico quantitative structure–activity relationship (QSAR) approaches is very helpful for guiding safe drug development and accelerating the drug development procedure. The ongoing development of machine learning techniques has made this task easier...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuanyuan Dan, Junhao Ruan, Zhenghua Zhu, Hualong Yu
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/30/7/1548
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849730174969446400
author Yuanyuan Dan
Junhao Ruan
Zhenghua Zhu
Hualong Yu
author_facet Yuanyuan Dan
Junhao Ruan
Zhenghua Zhu
Hualong Yu
author_sort Yuanyuan Dan
collection DOAJ
description Predicting the toxicity of drug molecules using in silico quantitative structure–activity relationship (QSAR) approaches is very helpful for guiding safe drug development and accelerating the drug development procedure. The ongoing development of machine learning techniques has made this task easier and more accurate, but it still suffers negative effects from both the severely skewed distribution of active/inactive chemicals and relatively high-dimensional feature distribution. To simultaneously address both of these issues, a binary ant colony optimization feature selection algorithm, called BACO, is proposed in this study. Specifically, it divides the labeled drug molecules into a training set and a validation set multiple times; with each division, the ant colony seeks an optimal feature group that aims to maximize the weighted combination of three specific class imbalance performance metrics (F-measure, G-mean, and MCC) on the validation set. Then, after running all divisions, the frequency of each feature (descriptor) that emerges in the optimal feature groups is calculated and ranked in descending order. Only those high-frequency features are used to train a support vector machine (SVM) and construct the structure–activity relationship (SAR) prediction model. The experimental results for the 12 datasets in the Tox21 challenge, represented by the Modred descriptor calculator, show that the proposed BACO method significantly outperforms several traditional feature selection approaches that have been widely used in QSAR analysis. It only requires a few to a few dozen descriptors for most datasets to exhibit its best performance, which shows its effectiveness and potential application value in cheminformatics.
format Article
id doaj-art-54b4faf5a3ad4671b2be459e070a3c24
institution DOAJ
issn 1420-3049
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj-art-54b4faf5a3ad4671b2be459e070a3c242025-08-20T03:08:57ZengMDPI AGMolecules1420-30492025-03-01307154810.3390/molecules30071548Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection ApproachYuanyuan Dan0Junhao Ruan1Zhenghua Zhu2Hualong Yu3School of Environmental and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaSchool of Environmental and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaSchool of Environmental and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaSchool of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaPredicting the toxicity of drug molecules using in silico quantitative structure–activity relationship (QSAR) approaches is very helpful for guiding safe drug development and accelerating the drug development procedure. The ongoing development of machine learning techniques has made this task easier and more accurate, but it still suffers negative effects from both the severely skewed distribution of active/inactive chemicals and relatively high-dimensional feature distribution. To simultaneously address both of these issues, a binary ant colony optimization feature selection algorithm, called BACO, is proposed in this study. Specifically, it divides the labeled drug molecules into a training set and a validation set multiple times; with each division, the ant colony seeks an optimal feature group that aims to maximize the weighted combination of three specific class imbalance performance metrics (F-measure, G-mean, and MCC) on the validation set. Then, after running all divisions, the frequency of each feature (descriptor) that emerges in the optimal feature groups is calculated and ranked in descending order. Only those high-frequency features are used to train a support vector machine (SVM) and construct the structure–activity relationship (SAR) prediction model. The experimental results for the 12 datasets in the Tox21 challenge, represented by the Modred descriptor calculator, show that the proposed BACO method significantly outperforms several traditional feature selection approaches that have been widely used in QSAR analysis. It only requires a few to a few dozen descriptors for most datasets to exhibit its best performance, which shows its effectiveness and potential application value in cheminformatics.https://www.mdpi.com/1420-3049/30/7/1548toxicity predictionmolecule descriptorsquantitative structure–activity relationship (QSAR)feature selectionbinary ant colony optimizationTox21 challenge
spellingShingle Yuanyuan Dan
Junhao Ruan
Zhenghua Zhu
Hualong Yu
Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection Approach
Molecules
toxicity prediction
molecule descriptors
quantitative structure–activity relationship (QSAR)
feature selection
binary ant colony optimization
Tox21 challenge
title Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection Approach
title_full Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection Approach
title_fullStr Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection Approach
title_full_unstemmed Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection Approach
title_short Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection Approach
title_sort predicting the toxicity of drug molecules with selecting effective descriptors using a binary ant colony optimization baco feature selection approach
topic toxicity prediction
molecule descriptors
quantitative structure–activity relationship (QSAR)
feature selection
binary ant colony optimization
Tox21 challenge
url https://www.mdpi.com/1420-3049/30/7/1548
work_keys_str_mv AT yuanyuandan predictingthetoxicityofdrugmoleculeswithselectingeffectivedescriptorsusingabinaryantcolonyoptimizationbacofeatureselectionapproach
AT junhaoruan predictingthetoxicityofdrugmoleculeswithselectingeffectivedescriptorsusingabinaryantcolonyoptimizationbacofeatureselectionapproach
AT zhenghuazhu predictingthetoxicityofdrugmoleculeswithselectingeffectivedescriptorsusingabinaryantcolonyoptimizationbacofeatureselectionapproach
AT hualongyu predictingthetoxicityofdrugmoleculeswithselectingeffectivedescriptorsusingabinaryantcolonyoptimizationbacofeatureselectionapproach