Ion channel classification through machine learning and protein language model embeddings
Ion channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2024-11-01
|
Series: | Journal of Integrative Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1515/jib-2023-0047 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841556491660689408 |
---|---|
author | Ghazikhani Hamed Butler Gregory |
author_facet | Ghazikhani Hamed Butler Gregory |
author_sort | Ghazikhani Hamed |
collection | DOAJ |
description | Ion channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This study extends our previous work on protein language models for ion channel prediction, significantly advancing the methodology and performance. We employ a comprehensive array of machine learning algorithms, including k-Nearest Neighbors, Random Forest, Support Vector Machines, and Feed-Forward Neural Networks, alongside a novel Convolutional Neural Network (CNN) approach. These methods leverage fine-tuned embeddings from ProtBERT, ProtBERT-BFD, and MembraneBERT to differentiate ion channels from non-ion channels. Our empirical findings demonstrate that TooT-BERT-CNN-C, which combines features from ProtBERT-BFD and a CNN, substantially surpasses existing benchmarks. On our original dataset, it achieves a Matthews Correlation Coefficient (MCC) of 0.8584 and an accuracy of 98.35 %. More impressively, on a newly curated, larger dataset (DS-Cv2), it attains an MCC of 0.9492 and an ROC AUC of 0.9968 on the independent test set. These results not only highlight the power of integrating protein language models with deep learning for ion channel classification but also underscore the importance of using up-to-date, comprehensive datasets in bioinformatics tasks. Our approach represents a significant advancement in computational methods for ion channel identification, with potential implications for accelerating research in ion channel biology and aiding drug discovery efforts. |
format | Article |
id | doaj-art-4cec64d3f56b4034af37e2e35a6d4f86 |
institution | Kabale University |
issn | 1613-4516 |
language | English |
publishDate | 2024-11-01 |
publisher | De Gruyter |
record_format | Article |
series | Journal of Integrative Bioinformatics |
spelling | doaj-art-4cec64d3f56b4034af37e2e35a6d4f862025-01-07T07:55:54ZengDe GruyterJournal of Integrative Bioinformatics1613-45162024-11-012142022005551210.1515/jib-2023-0047Ion channel classification through machine learning and protein language model embeddingsGhazikhani Hamed0Butler Gregory1Department of Computer Science and Software Engineering, Concordia University, Montreal, CanadaDepartment of Computer Science and Software Engineering, Concordia University, Montreal, CanadaIon channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This study extends our previous work on protein language models for ion channel prediction, significantly advancing the methodology and performance. We employ a comprehensive array of machine learning algorithms, including k-Nearest Neighbors, Random Forest, Support Vector Machines, and Feed-Forward Neural Networks, alongside a novel Convolutional Neural Network (CNN) approach. These methods leverage fine-tuned embeddings from ProtBERT, ProtBERT-BFD, and MembraneBERT to differentiate ion channels from non-ion channels. Our empirical findings demonstrate that TooT-BERT-CNN-C, which combines features from ProtBERT-BFD and a CNN, substantially surpasses existing benchmarks. On our original dataset, it achieves a Matthews Correlation Coefficient (MCC) of 0.8584 and an accuracy of 98.35 %. More impressively, on a newly curated, larger dataset (DS-Cv2), it attains an MCC of 0.9492 and an ROC AUC of 0.9968 on the independent test set. These results not only highlight the power of integrating protein language models with deep learning for ion channel classification but also underscore the importance of using up-to-date, comprehensive datasets in bioinformatics tasks. Our approach represents a significant advancement in computational methods for ion channel identification, with potential implications for accelerating research in ion channel biology and aiding drug discovery efforts.https://doi.org/10.1515/jib-2023-0047ion channelsmembrane proteinstransmembrane proteinsdrug discoveryprotein language modelsconvolutional neural network |
spellingShingle | Ghazikhani Hamed Butler Gregory Ion channel classification through machine learning and protein language model embeddings Journal of Integrative Bioinformatics ion channels membrane proteins transmembrane proteins drug discovery protein language models convolutional neural network |
title | Ion channel classification through machine learning and protein language model embeddings |
title_full | Ion channel classification through machine learning and protein language model embeddings |
title_fullStr | Ion channel classification through machine learning and protein language model embeddings |
title_full_unstemmed | Ion channel classification through machine learning and protein language model embeddings |
title_short | Ion channel classification through machine learning and protein language model embeddings |
title_sort | ion channel classification through machine learning and protein language model embeddings |
topic | ion channels membrane proteins transmembrane proteins drug discovery protein language models convolutional neural network |
url | https://doi.org/10.1515/jib-2023-0047 |
work_keys_str_mv | AT ghazikhanihamed ionchannelclassificationthroughmachinelearningandproteinlanguagemodelembeddings AT butlergregory ionchannelclassificationthroughmachinelearningandproteinlanguagemodelembeddings |