Predicting drug protein interactions based on improved support vector data description in unbalanced data
Introduction: Predicting drug-protein interactions is critical in drug discovery, but traditional laboratory methods are expensive and time-consuming. Computational approaches, especially those leveraging machine learning, are increasingly popular. This paper introduces VASVDD, a multi-step method t...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Tabriz University of Medical Sciences
2024-12-01
|
| Series: | BioImpacts |
| Subjects: | |
| Online Access: | https://bi.tbzmed.ac.ir/PDF/bi-15-30468.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850034209954988032 |
|---|---|
| author | Alireza Khorramfard Jamshid Pirgazi Ali Ghanbari Sorkhi |
| author_facet | Alireza Khorramfard Jamshid Pirgazi Ali Ghanbari Sorkhi |
| author_sort | Alireza Khorramfard |
| collection | DOAJ |
| description | Introduction: Predicting drug-protein interactions is critical in drug discovery, but traditional laboratory methods are expensive and time-consuming. Computational approaches, especially those leveraging machine learning, are increasingly popular. This paper introduces VASVDD, a multi-step method to predict drug-protein interactions. First, it extracts features from amino acid sequences in proteins and drug structures. To address the challenge of unbalanced datasets, a Support Vector Data Description (SVDD) approach is employed, outperforming standard techniques like SMOTE and ENN in balancing data. Subsequently, dimensionality reduction using a Variational Autoencoder (VAE) reduces features from 1074 to 32, improving computational efficiency and predictive performance. Methods: The proposed method was evaluated on four datasets related to enzymes, G-protein-coupled receptors, ion channels, and nuclear receptors. Without preprocessing, the Gradient Boosting Classifier showed bias towards the majority class. However, balancing and dimensionality reduction significantly improved accuracy, sensitivity, specificity, and F1 scores. VASVDD demonstrated superior performance compared to other dimensionality reduction methods, such as kernel principal component analysis (kernel PCA) and Principal Component Analysis (PCA), and was validated across multiple classifiers, achieving higher AUROC values than existing techniques. Results: The results highlight VASVDD's effectiveness and generalizability in predicting drug-target interactions. The method outperforms state-of-the-art techniques in terms of accuracy, robustness, and efficiency, making it a promising tool in bioinformatics for drug discovery. Conclusion: The datasets analyzed during the current study are not publicly available but are available from the corresponding author upon reasonable request and source code are available on GitHub: https://github.com/alirezakhorramfard/vasvdd. |
| format | Article |
| id | doaj-art-68d53bda6a0d4bfcb61d7aa3a8aabd27 |
| institution | DOAJ |
| issn | 2228-5652 2228-5660 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Tabriz University of Medical Sciences |
| record_format | Article |
| series | BioImpacts |
| spelling | doaj-art-68d53bda6a0d4bfcb61d7aa3a8aabd272025-08-20T02:57:54ZengTabriz University of Medical SciencesBioImpacts2228-56522228-56602024-12-01151304683046810.34172/bi.30468bi-30468Predicting drug protein interactions based on improved support vector data description in unbalanced dataAlireza Khorramfard0Jamshid Pirgazi1Ali Ghanbari Sorkhi2Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, IranDepartment of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, IranDepartment of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, IranIntroduction: Predicting drug-protein interactions is critical in drug discovery, but traditional laboratory methods are expensive and time-consuming. Computational approaches, especially those leveraging machine learning, are increasingly popular. This paper introduces VASVDD, a multi-step method to predict drug-protein interactions. First, it extracts features from amino acid sequences in proteins and drug structures. To address the challenge of unbalanced datasets, a Support Vector Data Description (SVDD) approach is employed, outperforming standard techniques like SMOTE and ENN in balancing data. Subsequently, dimensionality reduction using a Variational Autoencoder (VAE) reduces features from 1074 to 32, improving computational efficiency and predictive performance. Methods: The proposed method was evaluated on four datasets related to enzymes, G-protein-coupled receptors, ion channels, and nuclear receptors. Without preprocessing, the Gradient Boosting Classifier showed bias towards the majority class. However, balancing and dimensionality reduction significantly improved accuracy, sensitivity, specificity, and F1 scores. VASVDD demonstrated superior performance compared to other dimensionality reduction methods, such as kernel principal component analysis (kernel PCA) and Principal Component Analysis (PCA), and was validated across multiple classifiers, achieving higher AUROC values than existing techniques. Results: The results highlight VASVDD's effectiveness and generalizability in predicting drug-target interactions. The method outperforms state-of-the-art techniques in terms of accuracy, robustness, and efficiency, making it a promising tool in bioinformatics for drug discovery. Conclusion: The datasets analyzed during the current study are not publicly available but are available from the corresponding author upon reasonable request and source code are available on GitHub: https://github.com/alirezakhorramfard/vasvdd.https://bi.tbzmed.ac.ir/PDF/bi-15-30468.pdfdrug-protein interactionsupport vector datadeep learningvariational autoencoderunbalanced data |
| spellingShingle | Alireza Khorramfard Jamshid Pirgazi Ali Ghanbari Sorkhi Predicting drug protein interactions based on improved support vector data description in unbalanced data BioImpacts drug-protein interaction support vector data deep learning variational autoencoder unbalanced data |
| title | Predicting drug protein interactions based on improved support vector data description in unbalanced data |
| title_full | Predicting drug protein interactions based on improved support vector data description in unbalanced data |
| title_fullStr | Predicting drug protein interactions based on improved support vector data description in unbalanced data |
| title_full_unstemmed | Predicting drug protein interactions based on improved support vector data description in unbalanced data |
| title_short | Predicting drug protein interactions based on improved support vector data description in unbalanced data |
| title_sort | predicting drug protein interactions based on improved support vector data description in unbalanced data |
| topic | drug-protein interaction support vector data deep learning variational autoencoder unbalanced data |
| url | https://bi.tbzmed.ac.ir/PDF/bi-15-30468.pdf |
| work_keys_str_mv | AT alirezakhorramfard predictingdrugproteininteractionsbasedonimprovedsupportvectordatadescriptioninunbalanceddata AT jamshidpirgazi predictingdrugproteininteractionsbasedonimprovedsupportvectordatadescriptioninunbalanceddata AT alighanbarisorkhi predictingdrugproteininteractionsbasedonimprovedsupportvectordatadescriptioninunbalanceddata |