Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability

Pseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the chall...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zeeshan Ahmed, Kashif Munir, Muhammad Usama Tanveer, Syed Rizwan Hassan, Ateeq Ur Rehman, Habib Hamam
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Pseudogene classification transcriptome analysis explainable artificial intelligence deep learning bioinformatics AI tools
Online Access:	https://ieeexplore.ieee.org/document/11071306/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849319975207043072
author	Zeeshan Ahmed Kashif Munir Muhammad Usama Tanveer Syed Rizwan Hassan Ateeq Ur Rehman Habib Hamam
author_facet	Zeeshan Ahmed Kashif Munir Muhammad Usama Tanveer Syed Rizwan Hassan Ateeq Ur Rehman Habib Hamam
author_sort	Zeeshan Ahmed
collection	DOAJ
description	Pseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the challenges of large-scale transcriptomic analysis. The proposed approach integrates an autoencoder for dimensionality reduction, a conditional generative adversarial network (cGAN) for synthetic data generation, and a TabNet classifier for final prediction. Extensive literature on pseudogenes highlights their interaction with coding genes, non-coding RNAs, and epigenetic mechanisms, which are pivotal in transcriptional and post-transcriptional regulation. The pipeline uses SMOTE to mitigate class imbalance and applies synthetic feature augmentation to boost classification performance. Tested on a large-scale curated transcriptomic dataset, our framework achieves an accuracy of 96%, surpassing traditional machine learning models. Visualization tools such as t-SNE, heatmaps, and SHAP plots further enhance model interpretability. The system is fully implemented on accessible AI platforms (Google Colab, Kaggle), ensuring real-time simulation and reproducibility, especially for research teams with limited computational resources. This method offers a scalable, explainable solution for pseudogene identification and non-coding RNA characterization, with broad applications in cancer research, genome annotation, and precision transcriptomics. The results underscore the power of deep learning and generative models in unveiling the regulatory complexity of pseudogenes, contributing to future genomic studies and clinical precision medicine. Additionally, a user-friendly Gradio interface has been developed, enabling interactive exploration and prediction of pseudogene classes, providing a practical tool for biologists and clinicians alike.
format	Article
id	doaj-art-e659238bf01e45bcb7420fceb610630c
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-e659238bf01e45bcb7420fceb610630c2025-08-20T03:50:16ZengIEEEIEEE Access2169-35362025-01-011311809611811110.1109/ACCESS.2025.358560111071306Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet InterpretabilityZeeshan Ahmed0https://orcid.org/0009-0005-8514-9625Kashif Munir1https://orcid.org/0000-0001-5114-4213Muhammad Usama Tanveer2https://orcid.org/0009-0002-7374-9461Syed Rizwan Hassan3https://orcid.org/0000-0002-6206-3934Ateeq Ur Rehman4https://orcid.org/0000-0001-5203-0621Habib Hamam5https://orcid.org/0000-0002-5320-1012Institute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, PakistanInstitute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, PakistanInstitute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, PakistanDepartment of Computer Engineering, Gachon University, Seongnam-si, South KoreaComputer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamil Nadu, IndiaFaculty of Engineering, Université de Moncton, Moncton, NB, CanadaPseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the challenges of large-scale transcriptomic analysis. The proposed approach integrates an autoencoder for dimensionality reduction, a conditional generative adversarial network (cGAN) for synthetic data generation, and a TabNet classifier for final prediction. Extensive literature on pseudogenes highlights their interaction with coding genes, non-coding RNAs, and epigenetic mechanisms, which are pivotal in transcriptional and post-transcriptional regulation. The pipeline uses SMOTE to mitigate class imbalance and applies synthetic feature augmentation to boost classification performance. Tested on a large-scale curated transcriptomic dataset, our framework achieves an accuracy of 96%, surpassing traditional machine learning models. Visualization tools such as t-SNE, heatmaps, and SHAP plots further enhance model interpretability. The system is fully implemented on accessible AI platforms (Google Colab, Kaggle), ensuring real-time simulation and reproducibility, especially for research teams with limited computational resources. This method offers a scalable, explainable solution for pseudogene identification and non-coding RNA characterization, with broad applications in cancer research, genome annotation, and precision transcriptomics. The results underscore the power of deep learning and generative models in unveiling the regulatory complexity of pseudogenes, contributing to future genomic studies and clinical precision medicine. Additionally, a user-friendly Gradio interface has been developed, enabling interactive exploration and prediction of pseudogene classes, providing a practical tool for biologists and clinicians alike.https://ieeexplore.ieee.org/document/11071306/Pseudogene classificationtranscriptome analysisexplainable artificial intelligencedeep learningbioinformaticsAI tools
spellingShingle	Zeeshan Ahmed Kashif Munir Muhammad Usama Tanveer Syed Rizwan Hassan Ateeq Ur Rehman Habib Hamam Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability IEEE Access Pseudogene classification transcriptome analysis explainable artificial intelligence deep learning bioinformatics AI tools
title	Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_full	Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_fullStr	Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_full_unstemmed	Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_short	Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
title_sort	deep pseudogene categorization and genome wide transcription prediction using ganp based feature selection and tabnet interpretability
topic	Pseudogene classification transcriptome analysis explainable artificial intelligence deep learning bioinformatics AI tools
url	https://ieeexplore.ieee.org/document/11071306/
work_keys_str_mv	AT zeeshanahmed deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT kashifmunir deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT muhammadusamatanveer deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT syedrizwanhassan deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT ateequrrehman deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT habibhamam deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability

Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability

Similar Items