Ion channel classification through machine learning and protein language model embeddings

Ion channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This...

Full description

Saved in:
Bibliographic Details
Main Authors: Ghazikhani Hamed, Butler Gregory
Format: Article
Language:English
Published: De Gruyter 2024-11-01
Series:Journal of Integrative Bioinformatics
Subjects:
Online Access:https://doi.org/10.1515/jib-2023-0047
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841556491660689408
author Ghazikhani Hamed
Butler Gregory
author_facet Ghazikhani Hamed
Butler Gregory
author_sort Ghazikhani Hamed
collection DOAJ
description Ion channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This study extends our previous work on protein language models for ion channel prediction, significantly advancing the methodology and performance. We employ a comprehensive array of machine learning algorithms, including k-Nearest Neighbors, Random Forest, Support Vector Machines, and Feed-Forward Neural Networks, alongside a novel Convolutional Neural Network (CNN) approach. These methods leverage fine-tuned embeddings from ProtBERT, ProtBERT-BFD, and MembraneBERT to differentiate ion channels from non-ion channels. Our empirical findings demonstrate that TooT-BERT-CNN-C, which combines features from ProtBERT-BFD and a CNN, substantially surpasses existing benchmarks. On our original dataset, it achieves a Matthews Correlation Coefficient (MCC) of 0.8584 and an accuracy of 98.35 %. More impressively, on a newly curated, larger dataset (DS-Cv2), it attains an MCC of 0.9492 and an ROC AUC of 0.9968 on the independent test set. These results not only highlight the power of integrating protein language models with deep learning for ion channel classification but also underscore the importance of using up-to-date, comprehensive datasets in bioinformatics tasks. Our approach represents a significant advancement in computational methods for ion channel identification, with potential implications for accelerating research in ion channel biology and aiding drug discovery efforts.
format Article
id doaj-art-4cec64d3f56b4034af37e2e35a6d4f86
institution Kabale University
issn 1613-4516
language English
publishDate 2024-11-01
publisher De Gruyter
record_format Article
series Journal of Integrative Bioinformatics
spelling doaj-art-4cec64d3f56b4034af37e2e35a6d4f862025-01-07T07:55:54ZengDe GruyterJournal of Integrative Bioinformatics1613-45162024-11-012142022005551210.1515/jib-2023-0047Ion channel classification through machine learning and protein language model embeddingsGhazikhani Hamed0Butler Gregory1Department of Computer Science and Software Engineering, Concordia University, Montreal, CanadaDepartment of Computer Science and Software Engineering, Concordia University, Montreal, CanadaIon channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This study extends our previous work on protein language models for ion channel prediction, significantly advancing the methodology and performance. We employ a comprehensive array of machine learning algorithms, including k-Nearest Neighbors, Random Forest, Support Vector Machines, and Feed-Forward Neural Networks, alongside a novel Convolutional Neural Network (CNN) approach. These methods leverage fine-tuned embeddings from ProtBERT, ProtBERT-BFD, and MembraneBERT to differentiate ion channels from non-ion channels. Our empirical findings demonstrate that TooT-BERT-CNN-C, which combines features from ProtBERT-BFD and a CNN, substantially surpasses existing benchmarks. On our original dataset, it achieves a Matthews Correlation Coefficient (MCC) of 0.8584 and an accuracy of 98.35 %. More impressively, on a newly curated, larger dataset (DS-Cv2), it attains an MCC of 0.9492 and an ROC AUC of 0.9968 on the independent test set. These results not only highlight the power of integrating protein language models with deep learning for ion channel classification but also underscore the importance of using up-to-date, comprehensive datasets in bioinformatics tasks. Our approach represents a significant advancement in computational methods for ion channel identification, with potential implications for accelerating research in ion channel biology and aiding drug discovery efforts.https://doi.org/10.1515/jib-2023-0047ion channelsmembrane proteinstransmembrane proteinsdrug discoveryprotein language modelsconvolutional neural network
spellingShingle Ghazikhani Hamed
Butler Gregory
Ion channel classification through machine learning and protein language model embeddings
Journal of Integrative Bioinformatics
ion channels
membrane proteins
transmembrane proteins
drug discovery
protein language models
convolutional neural network
title Ion channel classification through machine learning and protein language model embeddings
title_full Ion channel classification through machine learning and protein language model embeddings
title_fullStr Ion channel classification through machine learning and protein language model embeddings
title_full_unstemmed Ion channel classification through machine learning and protein language model embeddings
title_short Ion channel classification through machine learning and protein language model embeddings
title_sort ion channel classification through machine learning and protein language model embeddings
topic ion channels
membrane proteins
transmembrane proteins
drug discovery
protein language models
convolutional neural network
url https://doi.org/10.1515/jib-2023-0047
work_keys_str_mv AT ghazikhanihamed ionchannelclassificationthroughmachinelearningandproteinlanguagemodelembeddings
AT butlergregory ionchannelclassificationthroughmachinelearningandproteinlanguagemodelembeddings