Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability

Pseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the chall...

Full description

Saved in:
Bibliographic Details
Main Authors: Zeeshan Ahmed, Kashif Munir, Muhammad Usama Tanveer, Syed Rizwan Hassan, Ateeq Ur Rehman, Habib Hamam
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11071306/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319975207043072
author Zeeshan Ahmed
Kashif Munir
Muhammad Usama Tanveer
Syed Rizwan Hassan
Ateeq Ur Rehman
Habib Hamam
author_facet Zeeshan Ahmed
Kashif Munir
Muhammad Usama Tanveer
Syed Rizwan Hassan
Ateeq Ur Rehman
Habib Hamam
author_sort Zeeshan Ahmed
collection DOAJ
description Pseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the challenges of large-scale transcriptomic analysis. The proposed approach integrates an autoencoder for dimensionality reduction, a conditional generative adversarial network (cGAN) for synthetic data generation, and a TabNet classifier for final prediction. Extensive literature on pseudogenes highlights their interaction with coding genes, non-coding RNAs, and epigenetic mechanisms, which are pivotal in transcriptional and post-transcriptional regulation. The pipeline uses SMOTE to mitigate class imbalance and applies synthetic feature augmentation to boost classification performance. Tested on a large-scale curated transcriptomic dataset, our framework achieves an accuracy of 96%, surpassing traditional machine learning models. Visualization tools such as t-SNE, heatmaps, and SHAP plots further enhance model interpretability. The system is fully implemented on accessible AI platforms (Google Colab, Kaggle), ensuring real-time simulation and reproducibility, especially for research teams with limited computational resources. This method offers a scalable, explainable solution for pseudogene identification and non-coding RNA characterization, with broad applications in cancer research, genome annotation, and precision transcriptomics. The results underscore the power of deep learning and generative models in unveiling the regulatory complexity of pseudogenes, contributing to future genomic studies and clinical precision medicine. Additionally, a user-friendly Gradio interface has been developed, enabling interactive exploration and prediction of pseudogene classes, providing a practical tool for biologists and clinicians alike.
format Article
id doaj-art-e659238bf01e45bcb7420fceb610630c
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-e659238bf01e45bcb7420fceb610630c2025-08-20T03:50:16ZengIEEEIEEE Access2169-35362025-01-011311809611811110.1109/ACCESS.2025.358560111071306Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet InterpretabilityZeeshan Ahmed0https://orcid.org/0009-0005-8514-9625Kashif Munir1https://orcid.org/0000-0001-5114-4213Muhammad Usama Tanveer2https://orcid.org/0009-0002-7374-9461Syed Rizwan Hassan3https://orcid.org/0000-0002-6206-3934Ateeq Ur Rehman4https://orcid.org/0000-0001-5203-0621Habib Hamam5https://orcid.org/0000-0002-5320-1012Institute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, PakistanInstitute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, PakistanInstitute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, PakistanDepartment of Computer Engineering, Gachon University, Seongnam-si, South KoreaComputer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamil Nadu, IndiaFaculty of Engineering, Université de Moncton, Moncton, NB, CanadaPseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the challenges of large-scale transcriptomic analysis. The proposed approach integrates an autoencoder for dimensionality reduction, a conditional generative adversarial network (cGAN) for synthetic data generation, and a TabNet classifier for final prediction. Extensive literature on pseudogenes highlights their interaction with coding genes, non-coding RNAs, and epigenetic mechanisms, which are pivotal in transcriptional and post-transcriptional regulation. The pipeline uses SMOTE to mitigate class imbalance and applies synthetic feature augmentation to boost classification performance. Tested on a large-scale curated transcriptomic dataset, our framework achieves an accuracy of 96%, surpassing traditional machine learning models. Visualization tools such as t-SNE, heatmaps, and SHAP plots further enhance model interpretability. The system is fully implemented on accessible AI platforms (Google Colab, Kaggle), ensuring real-time simulation and reproducibility, especially for research teams with limited computational resources. This method offers a scalable, explainable solution for pseudogene identification and non-coding RNA characterization, with broad applications in cancer research, genome annotation, and precision transcriptomics. The results underscore the power of deep learning and generative models in unveiling the regulatory complexity of pseudogenes, contributing to future genomic studies and clinical precision medicine. Additionally, a user-friendly Gradio interface has been developed, enabling interactive exploration and prediction of pseudogene classes, providing a practical tool for biologists and clinicians alike.https://ieeexplore.ieee.org/document/11071306/Pseudogene classificationtranscriptome analysisexplainable artificial intelligencedeep learningbioinformaticsAI tools
spellingShingle Zeeshan Ahmed
Kashif Munir
Muhammad Usama Tanveer
Syed Rizwan Hassan
Ateeq Ur Rehman
Habib Hamam
Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
IEEE Access
Pseudogene classification
transcriptome analysis
explainable artificial intelligence
deep learning
bioinformatics
AI tools
title Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_full Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_fullStr Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_full_unstemmed Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_short Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_sort deep pseudogene categorization and genome wide transcription prediction using ganp based feature selection and tabnet interpretability
topic Pseudogene classification
transcriptome analysis
explainable artificial intelligence
deep learning
bioinformatics
AI tools
url https://ieeexplore.ieee.org/document/11071306/
work_keys_str_mv AT zeeshanahmed deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability
AT kashifmunir deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability
AT muhammadusamatanveer deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability
AT syedrizwanhassan deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability
AT ateequrrehman deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability
AT habibhamam deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability