Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways
Abstract RNA methylation modification influences various processes in the human body and has gained increasing attention from scholars. Predicting genes associated with RNA methylation pathways can significantly aid biologists in studying RNA methylation processes. Several prediction methods have be...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-11-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-024-77107-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850179089143431168 |
|---|---|
| author | Tuo Jiang Cong Shen Pingjian Ding Lingyun Luo |
| author_facet | Tuo Jiang Cong Shen Pingjian Ding Lingyun Luo |
| author_sort | Tuo Jiang |
| collection | DOAJ |
| description | Abstract RNA methylation modification influences various processes in the human body and has gained increasing attention from scholars. Predicting genes associated with RNA methylation pathways can significantly aid biologists in studying RNA methylation processes. Several prediction methods have been investigated, but their performance is still limited by the scarcity of positive samples. To address the challenge of data imbalance in RNA methylation-associated gene prediction tasks, this study employed a generative adversarial network to learn the feature distribution of the original dataset. The quality of synthetic samples was controlled using the Classifier Two-Sample Test (CTST). These synthetic samples were then added to the data blocks to mitigate class distribution imbalance. Experimental results demonstrated that integrating the synthetic samples generated by our proposed model with the original data enhances the prediction performance of various classifiers, outperforming other oversampling methods. Moreover, gene ontology (GO) enrichment analyses further demonstrate the effectiveness of the predicted genes associated with RNA methylation pathways. The model generating gene samples with PyTorch is available at https://github.com/heyheyheyheyhey1/WGAN-GP_RNA_methylation |
| format | Article |
| id | doaj-art-b33ca71c93d74ab9ade5172d1b73366d |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-b33ca71c93d74ab9ade5172d1b73366d2025-08-20T02:18:35ZengNature PortfolioScientific Reports2045-23222024-11-0114111610.1038/s41598-024-77107-0Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathwaysTuo Jiang0Cong Shen1Pingjian Ding2Lingyun Luo3School of Computer Science, University of South ChinaDepartment of Mathematics, National University of SingaporeSchool of Computer Science, University of South ChinaSchool of Computer Science, University of South ChinaAbstract RNA methylation modification influences various processes in the human body and has gained increasing attention from scholars. Predicting genes associated with RNA methylation pathways can significantly aid biologists in studying RNA methylation processes. Several prediction methods have been investigated, but their performance is still limited by the scarcity of positive samples. To address the challenge of data imbalance in RNA methylation-associated gene prediction tasks, this study employed a generative adversarial network to learn the feature distribution of the original dataset. The quality of synthetic samples was controlled using the Classifier Two-Sample Test (CTST). These synthetic samples were then added to the data blocks to mitigate class distribution imbalance. Experimental results demonstrated that integrating the synthetic samples generated by our proposed model with the original data enhances the prediction performance of various classifiers, outperforming other oversampling methods. Moreover, gene ontology (GO) enrichment analyses further demonstrate the effectiveness of the predicted genes associated with RNA methylation pathways. The model generating gene samples with PyTorch is available at https://github.com/heyheyheyheyhey1/WGAN-GP_RNA_methylationhttps://doi.org/10.1038/s41598-024-77107-0RNA methylationPathwaysMachine learningGenerative adversarial nets |
| spellingShingle | Tuo Jiang Cong Shen Pingjian Ding Lingyun Luo Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways Scientific Reports RNA methylation Pathways Machine learning Generative adversarial nets |
| title | Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways |
| title_full | Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways |
| title_fullStr | Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways |
| title_full_unstemmed | Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways |
| title_short | Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways |
| title_sort | data augmentation based on the wgan gp with data block to enhance the prediction of genes associated with rna methylation pathways |
| topic | RNA methylation Pathways Machine learning Generative adversarial nets |
| url | https://doi.org/10.1038/s41598-024-77107-0 |
| work_keys_str_mv | AT tuojiang dataaugmentationbasedonthewgangpwithdatablocktoenhancethepredictionofgenesassociatedwithrnamethylationpathways AT congshen dataaugmentationbasedonthewgangpwithdatablocktoenhancethepredictionofgenesassociatedwithrnamethylationpathways AT pingjianding dataaugmentationbasedonthewgangpwithdatablocktoenhancethepredictionofgenesassociatedwithrnamethylationpathways AT lingyunluo dataaugmentationbasedonthewgangpwithdatablocktoenhancethepredictionofgenesassociatedwithrnamethylationpathways |