Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability

Pseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the chall...

Full description

Saved in:
Bibliographic Details
Main Authors: Zeeshan Ahmed, Kashif Munir, Muhammad Usama Tanveer, Syed Rizwan Hassan, Ateeq Ur Rehman, Habib Hamam
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11071306/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Pseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the challenges of large-scale transcriptomic analysis. The proposed approach integrates an autoencoder for dimensionality reduction, a conditional generative adversarial network (cGAN) for synthetic data generation, and a TabNet classifier for final prediction. Extensive literature on pseudogenes highlights their interaction with coding genes, non-coding RNAs, and epigenetic mechanisms, which are pivotal in transcriptional and post-transcriptional regulation. The pipeline uses SMOTE to mitigate class imbalance and applies synthetic feature augmentation to boost classification performance. Tested on a large-scale curated transcriptomic dataset, our framework achieves an accuracy of 96%, surpassing traditional machine learning models. Visualization tools such as t-SNE, heatmaps, and SHAP plots further enhance model interpretability. The system is fully implemented on accessible AI platforms (Google Colab, Kaggle), ensuring real-time simulation and reproducibility, especially for research teams with limited computational resources. This method offers a scalable, explainable solution for pseudogene identification and non-coding RNA characterization, with broad applications in cancer research, genome annotation, and precision transcriptomics. The results underscore the power of deep learning and generative models in unveiling the regulatory complexity of pseudogenes, contributing to future genomic studies and clinical precision medicine. Additionally, a user-friendly Gradio interface has been developed, enabling interactive exploration and prediction of pseudogene classes, providing a practical tool for biologists and clinicians alike.
ISSN:2169-3536