Predicting drug protein interactions based on improved support vector data description in unbalanced data

Introduction: Predicting drug-protein interactions is critical in drug discovery, but traditional laboratory methods are expensive and time-consuming. Computational approaches, especially those leveraging machine learning, are increasingly popular. This paper introduces VASVDD, a multi-step method t...

Full description

Saved in:
Bibliographic Details
Main Authors: Alireza Khorramfard, Jamshid Pirgazi, Ali Ghanbari Sorkhi
Format: Article
Language:English
Published: Tabriz University of Medical Sciences 2024-12-01
Series:BioImpacts
Subjects:
Online Access:https://bi.tbzmed.ac.ir/PDF/bi-15-30468.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850034209954988032
author Alireza Khorramfard
Jamshid Pirgazi
Ali Ghanbari Sorkhi
author_facet Alireza Khorramfard
Jamshid Pirgazi
Ali Ghanbari Sorkhi
author_sort Alireza Khorramfard
collection DOAJ
description Introduction: Predicting drug-protein interactions is critical in drug discovery, but traditional laboratory methods are expensive and time-consuming. Computational approaches, especially those leveraging machine learning, are increasingly popular. This paper introduces VASVDD, a multi-step method to predict drug-protein interactions. First, it extracts features from amino acid sequences in proteins and drug structures. To address the challenge of unbalanced datasets, a Support Vector Data Description (SVDD) approach is employed, outperforming standard techniques like SMOTE and ENN in balancing data. Subsequently, dimensionality reduction using a Variational Autoencoder (VAE) reduces features from 1074 to 32, improving computational efficiency and predictive performance. Methods: The proposed method was evaluated on four datasets related to enzymes, G-protein-coupled receptors, ion channels, and nuclear receptors. Without preprocessing, the Gradient Boosting Classifier showed bias towards the majority class. However, balancing and dimensionality reduction significantly improved accuracy, sensitivity, specificity, and F1 scores. VASVDD demonstrated superior performance compared to other dimensionality reduction methods, such as kernel principal component analysis (kernel PCA) and Principal Component Analysis (PCA), and was validated across multiple classifiers, achieving higher AUROC values than existing techniques. Results: The results highlight VASVDD's effectiveness and generalizability in predicting drug-target interactions. The method outperforms state-of-the-art techniques in terms of accuracy, robustness, and efficiency, making it a promising tool in bioinformatics for drug discovery. Conclusion: The datasets analyzed during the current study are not publicly available but are available from the corresponding author upon reasonable request and source code are available on GitHub: https://github.com/alirezakhorramfard/vasvdd.
format Article
id doaj-art-68d53bda6a0d4bfcb61d7aa3a8aabd27
institution DOAJ
issn 2228-5652
2228-5660
language English
publishDate 2024-12-01
publisher Tabriz University of Medical Sciences
record_format Article
series BioImpacts
spelling doaj-art-68d53bda6a0d4bfcb61d7aa3a8aabd272025-08-20T02:57:54ZengTabriz University of Medical SciencesBioImpacts2228-56522228-56602024-12-01151304683046810.34172/bi.30468bi-30468Predicting drug protein interactions based on improved support vector data description in unbalanced dataAlireza Khorramfard0Jamshid Pirgazi1Ali Ghanbari Sorkhi2Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, IranDepartment of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, IranDepartment of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, IranIntroduction: Predicting drug-protein interactions is critical in drug discovery, but traditional laboratory methods are expensive and time-consuming. Computational approaches, especially those leveraging machine learning, are increasingly popular. This paper introduces VASVDD, a multi-step method to predict drug-protein interactions. First, it extracts features from amino acid sequences in proteins and drug structures. To address the challenge of unbalanced datasets, a Support Vector Data Description (SVDD) approach is employed, outperforming standard techniques like SMOTE and ENN in balancing data. Subsequently, dimensionality reduction using a Variational Autoencoder (VAE) reduces features from 1074 to 32, improving computational efficiency and predictive performance. Methods: The proposed method was evaluated on four datasets related to enzymes, G-protein-coupled receptors, ion channels, and nuclear receptors. Without preprocessing, the Gradient Boosting Classifier showed bias towards the majority class. However, balancing and dimensionality reduction significantly improved accuracy, sensitivity, specificity, and F1 scores. VASVDD demonstrated superior performance compared to other dimensionality reduction methods, such as kernel principal component analysis (kernel PCA) and Principal Component Analysis (PCA), and was validated across multiple classifiers, achieving higher AUROC values than existing techniques. Results: The results highlight VASVDD's effectiveness and generalizability in predicting drug-target interactions. The method outperforms state-of-the-art techniques in terms of accuracy, robustness, and efficiency, making it a promising tool in bioinformatics for drug discovery. Conclusion: The datasets analyzed during the current study are not publicly available but are available from the corresponding author upon reasonable request and source code are available on GitHub: https://github.com/alirezakhorramfard/vasvdd.https://bi.tbzmed.ac.ir/PDF/bi-15-30468.pdfdrug-protein interactionsupport vector datadeep learningvariational autoencoderunbalanced data
spellingShingle Alireza Khorramfard
Jamshid Pirgazi
Ali Ghanbari Sorkhi
Predicting drug protein interactions based on improved support vector data description in unbalanced data
BioImpacts
drug-protein interaction
support vector data
deep learning
variational autoencoder
unbalanced data
title Predicting drug protein interactions based on improved support vector data description in unbalanced data
title_full Predicting drug protein interactions based on improved support vector data description in unbalanced data
title_fullStr Predicting drug protein interactions based on improved support vector data description in unbalanced data
title_full_unstemmed Predicting drug protein interactions based on improved support vector data description in unbalanced data
title_short Predicting drug protein interactions based on improved support vector data description in unbalanced data
title_sort predicting drug protein interactions based on improved support vector data description in unbalanced data
topic drug-protein interaction
support vector data
deep learning
variational autoencoder
unbalanced data
url https://bi.tbzmed.ac.ir/PDF/bi-15-30468.pdf
work_keys_str_mv AT alirezakhorramfard predictingdrugproteininteractionsbasedonimprovedsupportvectordatadescriptioninunbalanceddata
AT jamshidpirgazi predictingdrugproteininteractionsbasedonimprovedsupportvectordatadescriptioninunbalanceddata
AT alighanbarisorkhi predictingdrugproteininteractionsbasedonimprovedsupportvectordatadescriptioninunbalanceddata