Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability
Pseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the chall...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11071306/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849319975207043072 |
|---|---|
| author | Zeeshan Ahmed Kashif Munir Muhammad Usama Tanveer Syed Rizwan Hassan Ateeq Ur Rehman Habib Hamam |
| author_facet | Zeeshan Ahmed Kashif Munir Muhammad Usama Tanveer Syed Rizwan Hassan Ateeq Ur Rehman Habib Hamam |
| author_sort | Zeeshan Ahmed |
| collection | DOAJ |
| description | Pseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the challenges of large-scale transcriptomic analysis. The proposed approach integrates an autoencoder for dimensionality reduction, a conditional generative adversarial network (cGAN) for synthetic data generation, and a TabNet classifier for final prediction. Extensive literature on pseudogenes highlights their interaction with coding genes, non-coding RNAs, and epigenetic mechanisms, which are pivotal in transcriptional and post-transcriptional regulation. The pipeline uses SMOTE to mitigate class imbalance and applies synthetic feature augmentation to boost classification performance. Tested on a large-scale curated transcriptomic dataset, our framework achieves an accuracy of 96%, surpassing traditional machine learning models. Visualization tools such as t-SNE, heatmaps, and SHAP plots further enhance model interpretability. The system is fully implemented on accessible AI platforms (Google Colab, Kaggle), ensuring real-time simulation and reproducibility, especially for research teams with limited computational resources. This method offers a scalable, explainable solution for pseudogene identification and non-coding RNA characterization, with broad applications in cancer research, genome annotation, and precision transcriptomics. The results underscore the power of deep learning and generative models in unveiling the regulatory complexity of pseudogenes, contributing to future genomic studies and clinical precision medicine. Additionally, a user-friendly Gradio interface has been developed, enabling interactive exploration and prediction of pseudogene classes, providing a practical tool for biologists and clinicians alike. |
| format | Article |
| id | doaj-art-e659238bf01e45bcb7420fceb610630c |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-e659238bf01e45bcb7420fceb610630c2025-08-20T03:50:16ZengIEEEIEEE Access2169-35362025-01-011311809611811110.1109/ACCESS.2025.358560111071306Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet InterpretabilityZeeshan Ahmed0https://orcid.org/0009-0005-8514-9625Kashif Munir1https://orcid.org/0000-0001-5114-4213Muhammad Usama Tanveer2https://orcid.org/0009-0002-7374-9461Syed Rizwan Hassan3https://orcid.org/0000-0002-6206-3934Ateeq Ur Rehman4https://orcid.org/0000-0001-5203-0621Habib Hamam5https://orcid.org/0000-0002-5320-1012Institute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, PakistanInstitute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, PakistanInstitute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, PakistanDepartment of Computer Engineering, Gachon University, Seongnam-si, South KoreaComputer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamil Nadu, IndiaFaculty of Engineering, Université de Moncton, Moncton, NB, CanadaPseudogenes, once regarded as genomic relics, have emerged as critical regulators of gene expression, influencing cancer, neurodegenerative disorders, and developmental processes. This study introduces an advanced framework for pseudogene classification, leveraging deep learning to address the challenges of large-scale transcriptomic analysis. The proposed approach integrates an autoencoder for dimensionality reduction, a conditional generative adversarial network (cGAN) for synthetic data generation, and a TabNet classifier for final prediction. Extensive literature on pseudogenes highlights their interaction with coding genes, non-coding RNAs, and epigenetic mechanisms, which are pivotal in transcriptional and post-transcriptional regulation. The pipeline uses SMOTE to mitigate class imbalance and applies synthetic feature augmentation to boost classification performance. Tested on a large-scale curated transcriptomic dataset, our framework achieves an accuracy of 96%, surpassing traditional machine learning models. Visualization tools such as t-SNE, heatmaps, and SHAP plots further enhance model interpretability. The system is fully implemented on accessible AI platforms (Google Colab, Kaggle), ensuring real-time simulation and reproducibility, especially for research teams with limited computational resources. This method offers a scalable, explainable solution for pseudogene identification and non-coding RNA characterization, with broad applications in cancer research, genome annotation, and precision transcriptomics. The results underscore the power of deep learning and generative models in unveiling the regulatory complexity of pseudogenes, contributing to future genomic studies and clinical precision medicine. Additionally, a user-friendly Gradio interface has been developed, enabling interactive exploration and prediction of pseudogene classes, providing a practical tool for biologists and clinicians alike.https://ieeexplore.ieee.org/document/11071306/Pseudogene classificationtranscriptome analysisexplainable artificial intelligencedeep learningbioinformaticsAI tools |
| spellingShingle | Zeeshan Ahmed Kashif Munir Muhammad Usama Tanveer Syed Rizwan Hassan Ateeq Ur Rehman Habib Hamam Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability IEEE Access Pseudogene classification transcriptome analysis explainable artificial intelligence deep learning bioinformatics AI tools |
| title | Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability |
| title_full | Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability |
| title_fullStr | Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability |
| title_full_unstemmed | Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability |
| title_short | Deep Pseudogene Categorization and Genome-Wide Transcription Prediction Using GANP-Based Feature Selection and TabNet Interpretability |
| title_sort | deep pseudogene categorization and genome wide transcription prediction using ganp based feature selection and tabnet interpretability |
| topic | Pseudogene classification transcriptome analysis explainable artificial intelligence deep learning bioinformatics AI tools |
| url | https://ieeexplore.ieee.org/document/11071306/ |
| work_keys_str_mv | AT zeeshanahmed deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT kashifmunir deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT muhammadusamatanveer deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT syedrizwanhassan deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT ateequrrehman deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability AT habibhamam deeppseudogenecategorizationandgenomewidetranscriptionpredictionusingganpbasedfeatureselectionandtabnetinterpretability |