SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning
Understanding the subcellular localization of long non-coding RNAs (lncRNAs) is crucial for unraveling their functional mechanisms. While previous computational methods have made progress in predicting lncRNA subcellular localization, most of them ignore the sequence order information by relying on...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2024-09-01
|
Series: | Big Data Mining and Analytics |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2024.9020002 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832543005945364480 |
---|---|
author | Min Li Baoying Zhao Yiming Li Pingjian Ding Rui Yin Shichao Kan Min Zeng |
author_facet | Min Li Baoying Zhao Yiming Li Pingjian Ding Rui Yin Shichao Kan Min Zeng |
author_sort | Min Li |
collection | DOAJ |
description | Understanding the subcellular localization of long non-coding RNAs (lncRNAs) is crucial for unraveling their functional mechanisms. While previous computational methods have made progress in predicting lncRNA subcellular localization, most of them ignore the sequence order information by relying on k-mer frequency features to encode lncRNA sequences. In the study, we develope SGCL-LncLoc, a novel interpretable deep learning model based on supervised graph contrastive learning. SGCL-LncLoc transforms lncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph. Then, SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation. Additionally, we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the lncRNA sequence, allowing SGCL-LncLoc to serve as an interpretable deep learning model. Furthermore, SGCL-LncLoc employs a supervised contrastive learning strategy, which leverages the relationships between different samples and label information, guiding the model to enhance representation learning for lncRNAs. Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors, showing its capability for accurate lncRNA subcellular localization prediction. Furthermore, we conduct a motif analysis, revealing that SGCL-LncLoc successfully captures known motifs associated with lncRNA subcellular localization. The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc. The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc. |
format | Article |
id | doaj-art-422169b76ccb4e3a9369e1499a6aa3ee |
institution | Kabale University |
issn | 2096-0654 |
language | English |
publishDate | 2024-09-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Big Data Mining and Analytics |
spelling | doaj-art-422169b76ccb4e3a9369e1499a6aa3ee2025-02-03T11:53:25ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-09-017376578010.26599/BDMA.2024.9020002SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive LearningMin Li0Baoying Zhao1Yiming Li2Pingjian Ding3Rui Yin4Shichao Kan5Min Zeng6School of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaCenter for Artificial Intelligence in Drug Discovery, Case Western Reserve University, Cleveland, OH 44106, USADepartment of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32603, USASchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaUnderstanding the subcellular localization of long non-coding RNAs (lncRNAs) is crucial for unraveling their functional mechanisms. While previous computational methods have made progress in predicting lncRNA subcellular localization, most of them ignore the sequence order information by relying on k-mer frequency features to encode lncRNA sequences. In the study, we develope SGCL-LncLoc, a novel interpretable deep learning model based on supervised graph contrastive learning. SGCL-LncLoc transforms lncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph. Then, SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation. Additionally, we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the lncRNA sequence, allowing SGCL-LncLoc to serve as an interpretable deep learning model. Furthermore, SGCL-LncLoc employs a supervised contrastive learning strategy, which leverages the relationships between different samples and label information, guiding the model to enhance representation learning for lncRNAs. Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors, showing its capability for accurate lncRNA subcellular localization prediction. Furthermore, we conduct a motif analysis, revealing that SGCL-LncLoc successfully captures known motifs associated with lncRNA subcellular localization. The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc. The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc.https://www.sciopen.com/article/10.26599/BDMA.2024.9020002supervised contrastive learninglong non-coding rna (lncrna)subcellular localization predictiondeep learninggraph convolutional network (gcn) |
spellingShingle | Min Li Baoying Zhao Yiming Li Pingjian Ding Rui Yin Shichao Kan Min Zeng SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning Big Data Mining and Analytics supervised contrastive learning long non-coding rna (lncrna) subcellular localization prediction deep learning graph convolutional network (gcn) |
title | SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning |
title_full | SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning |
title_fullStr | SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning |
title_full_unstemmed | SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning |
title_short | SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning |
title_sort | sgcl lncloc an interpretable deep learning model for improving lncrna subcellular localization prediction with supervised graph contrastive learning |
topic | supervised contrastive learning long non-coding rna (lncrna) subcellular localization prediction deep learning graph convolutional network (gcn) |
url | https://www.sciopen.com/article/10.26599/BDMA.2024.9020002 |
work_keys_str_mv | AT minli sgcllnclocaninterpretabledeeplearningmodelforimprovinglncrnasubcellularlocalizationpredictionwithsupervisedgraphcontrastivelearning AT baoyingzhao sgcllnclocaninterpretabledeeplearningmodelforimprovinglncrnasubcellularlocalizationpredictionwithsupervisedgraphcontrastivelearning AT yimingli sgcllnclocaninterpretabledeeplearningmodelforimprovinglncrnasubcellularlocalizationpredictionwithsupervisedgraphcontrastivelearning AT pingjianding sgcllnclocaninterpretabledeeplearningmodelforimprovinglncrnasubcellularlocalizationpredictionwithsupervisedgraphcontrastivelearning AT ruiyin sgcllnclocaninterpretabledeeplearningmodelforimprovinglncrnasubcellularlocalizationpredictionwithsupervisedgraphcontrastivelearning AT shichaokan sgcllnclocaninterpretabledeeplearningmodelforimprovinglncrnasubcellularlocalizationpredictionwithsupervisedgraphcontrastivelearning AT minzeng sgcllnclocaninterpretabledeeplearningmodelforimprovinglncrnasubcellularlocalizationpredictionwithsupervisedgraphcontrastivelearning |