Machine Learning Framework for Conotoxin Class and Molecular Target Prediction

Conotoxins are small and highly potent neurotoxic peptides derived from the venom of marine cone snails which have captured the interest of the scientific community due to their pharmacological potential. These toxins display significant sequence and structure diversity, which results in a wide rang...

Full description

Saved in:
Bibliographic Details
Main Authors: Duc P. Truong, Lyman K. Monroe, Robert F. Williams, Hau B. Nguyen
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Toxins
Subjects:
Online Access:https://www.mdpi.com/2072-6651/16/11/475
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850227664018735104
author Duc P. Truong
Lyman K. Monroe
Robert F. Williams
Hau B. Nguyen
author_facet Duc P. Truong
Lyman K. Monroe
Robert F. Williams
Hau B. Nguyen
author_sort Duc P. Truong
collection DOAJ
description Conotoxins are small and highly potent neurotoxic peptides derived from the venom of marine cone snails which have captured the interest of the scientific community due to their pharmacological potential. These toxins display significant sequence and structure diversity, which results in a wide range of specificities for several different ion channels and receptors. Despite the recognized importance of these compounds, our ability to determine their binding targets and toxicities remains a significant challenge. Predicting the target receptors of conotoxins, based solely on their amino acid sequence, remains a challenge due to the intricate relationships between structure, function, target specificity, and the significant conformational heterogeneity observed in conotoxins with the same primary sequence. We have previously demonstrated that the inclusion of post-translational modifications, collisional cross sections values, and other structural features, when added to the standard primary sequence features, improves the prediction accuracy of conotoxins against non-toxic and other toxic peptides across varied datasets and several different commonly used machine learning classifiers. Here, we present the effects of these features on conotoxin class and molecular target predictions, in particular, predicting conotoxins that bind to nicotinic acetylcholine receptors (nAChRs). We also demonstrate the use of the Synthetic Minority Oversampling Technique (SMOTE)-Tomek in balancing the datasets while simultaneously making the different classes more distinct by reducing the number of ambiguous samples which nearly overlap between the classes. In predicting the alpha, mu, and omega conotoxin classes, the SMOTE-Tomek PCA PLR model, using the combination of the SS and P feature sets establishes the best performance with an overall accuracy (OA) of 95.95%, with an average accuracy (AA) of 93.04%, and an f1 score of 0.959. Using this model, we obtained sensitivities of 98.98%, 89.66%, and 90.48% when predicting alpha, mu, and omega conotoxin classes, respectively. Similarly, in predicting conotoxins that bind to nAChRs, the SMOTE-Tomek PCA SVM model, which used the collisional cross sections (CCSs) and the P feature sets, demonstrated the highest performance with 91.3% OA, 91.32% AA, and an f1 score of 0.9131. The sensitivity when predicting conotoxins that bind to nAChRs is 91.46% with a 91.18% sensitivity when predicting conotoxins that do not bind to nAChRs.
format Article
id doaj-art-b9432ee095f74724a61cd41c68bc4a14
institution OA Journals
issn 2072-6651
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Toxins
spelling doaj-art-b9432ee095f74724a61cd41c68bc4a142025-08-20T02:04:44ZengMDPI AGToxins2072-66512024-11-01161147510.3390/toxins16110475Machine Learning Framework for Conotoxin Class and Molecular Target PredictionDuc P. Truong0Lyman K. Monroe1Robert F. Williams2Hau B. Nguyen3Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USABioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USABioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USABioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USAConotoxins are small and highly potent neurotoxic peptides derived from the venom of marine cone snails which have captured the interest of the scientific community due to their pharmacological potential. These toxins display significant sequence and structure diversity, which results in a wide range of specificities for several different ion channels and receptors. Despite the recognized importance of these compounds, our ability to determine their binding targets and toxicities remains a significant challenge. Predicting the target receptors of conotoxins, based solely on their amino acid sequence, remains a challenge due to the intricate relationships between structure, function, target specificity, and the significant conformational heterogeneity observed in conotoxins with the same primary sequence. We have previously demonstrated that the inclusion of post-translational modifications, collisional cross sections values, and other structural features, when added to the standard primary sequence features, improves the prediction accuracy of conotoxins against non-toxic and other toxic peptides across varied datasets and several different commonly used machine learning classifiers. Here, we present the effects of these features on conotoxin class and molecular target predictions, in particular, predicting conotoxins that bind to nicotinic acetylcholine receptors (nAChRs). We also demonstrate the use of the Synthetic Minority Oversampling Technique (SMOTE)-Tomek in balancing the datasets while simultaneously making the different classes more distinct by reducing the number of ambiguous samples which nearly overlap between the classes. In predicting the alpha, mu, and omega conotoxin classes, the SMOTE-Tomek PCA PLR model, using the combination of the SS and P feature sets establishes the best performance with an overall accuracy (OA) of 95.95%, with an average accuracy (AA) of 93.04%, and an f1 score of 0.959. Using this model, we obtained sensitivities of 98.98%, 89.66%, and 90.48% when predicting alpha, mu, and omega conotoxin classes, respectively. Similarly, in predicting conotoxins that bind to nAChRs, the SMOTE-Tomek PCA SVM model, which used the collisional cross sections (CCSs) and the P feature sets, demonstrated the highest performance with 91.3% OA, 91.32% AA, and an f1 score of 0.9131. The sensitivity when predicting conotoxins that bind to nAChRs is 91.46% with a 91.18% sensitivity when predicting conotoxins that do not bind to nAChRs.https://www.mdpi.com/2072-6651/16/11/475conotoxinsmachine learningcollisional cross sectionpost-translational modificationspredictionreceptors
spellingShingle Duc P. Truong
Lyman K. Monroe
Robert F. Williams
Hau B. Nguyen
Machine Learning Framework for Conotoxin Class and Molecular Target Prediction
Toxins
conotoxins
machine learning
collisional cross section
post-translational modifications
prediction
receptors
title Machine Learning Framework for Conotoxin Class and Molecular Target Prediction
title_full Machine Learning Framework for Conotoxin Class and Molecular Target Prediction
title_fullStr Machine Learning Framework for Conotoxin Class and Molecular Target Prediction
title_full_unstemmed Machine Learning Framework for Conotoxin Class and Molecular Target Prediction
title_short Machine Learning Framework for Conotoxin Class and Molecular Target Prediction
title_sort machine learning framework for conotoxin class and molecular target prediction
topic conotoxins
machine learning
collisional cross section
post-translational modifications
prediction
receptors
url https://www.mdpi.com/2072-6651/16/11/475
work_keys_str_mv AT ducptruong machinelearningframeworkforconotoxinclassandmoleculartargetprediction
AT lymankmonroe machinelearningframeworkforconotoxinclassandmoleculartargetprediction
AT robertfwilliams machinelearningframeworkforconotoxinclassandmoleculartargetprediction
AT haubnguyen machinelearningframeworkforconotoxinclassandmoleculartargetprediction