Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection Approach

Predicting the toxicity of drug molecules using in silico quantitative structure–activity relationship (QSAR) approaches is very helpful for guiding safe drug development and accelerating the drug development procedure. The ongoing development of machine learning techniques has made this task easier...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuanyuan Dan, Junhao Ruan, Zhenghua Zhu, Hualong Yu
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/30/7/1548
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Predicting the toxicity of drug molecules using in silico quantitative structure–activity relationship (QSAR) approaches is very helpful for guiding safe drug development and accelerating the drug development procedure. The ongoing development of machine learning techniques has made this task easier and more accurate, but it still suffers negative effects from both the severely skewed distribution of active/inactive chemicals and relatively high-dimensional feature distribution. To simultaneously address both of these issues, a binary ant colony optimization feature selection algorithm, called BACO, is proposed in this study. Specifically, it divides the labeled drug molecules into a training set and a validation set multiple times; with each division, the ant colony seeks an optimal feature group that aims to maximize the weighted combination of three specific class imbalance performance metrics (F-measure, G-mean, and MCC) on the validation set. Then, after running all divisions, the frequency of each feature (descriptor) that emerges in the optimal feature groups is calculated and ranked in descending order. Only those high-frequency features are used to train a support vector machine (SVM) and construct the structure–activity relationship (SAR) prediction model. The experimental results for the 12 datasets in the Tox21 challenge, represented by the Modred descriptor calculator, show that the proposed BACO method significantly outperforms several traditional feature selection approaches that have been widely used in QSAR analysis. It only requires a few to a few dozen descriptors for most datasets to exhibit its best performance, which shows its effectiveness and potential application value in cheminformatics.
ISSN:1420-3049