Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways

Abstract RNA methylation modification influences various processes in the human body and has gained increasing attention from scholars. Predicting genes associated with RNA methylation pathways can significantly aid biologists in studying RNA methylation processes. Several prediction methods have be...

Full description

Saved in:
Bibliographic Details
Main Authors: Tuo Jiang, Cong Shen, Pingjian Ding, Lingyun Luo
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-77107-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850179089143431168
author Tuo Jiang
Cong Shen
Pingjian Ding
Lingyun Luo
author_facet Tuo Jiang
Cong Shen
Pingjian Ding
Lingyun Luo
author_sort Tuo Jiang
collection DOAJ
description Abstract RNA methylation modification influences various processes in the human body and has gained increasing attention from scholars. Predicting genes associated with RNA methylation pathways can significantly aid biologists in studying RNA methylation processes. Several prediction methods have been investigated, but their performance is still limited by the scarcity of positive samples. To address the challenge of data imbalance in RNA methylation-associated gene prediction tasks, this study employed a generative adversarial network to learn the feature distribution of the original dataset. The quality of synthetic samples was controlled using the Classifier Two-Sample Test (CTST). These synthetic samples were then added to the data blocks to mitigate class distribution imbalance. Experimental results demonstrated that integrating the synthetic samples generated by our proposed model with the original data enhances the prediction performance of various classifiers, outperforming other oversampling methods. Moreover, gene ontology (GO) enrichment analyses further demonstrate the effectiveness of the predicted genes associated with RNA methylation pathways. The model generating gene samples with PyTorch is available at https://github.com/heyheyheyheyhey1/WGAN-GP_RNA_methylation
format Article
id doaj-art-b33ca71c93d74ab9ade5172d1b73366d
institution OA Journals
issn 2045-2322
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-b33ca71c93d74ab9ade5172d1b73366d2025-08-20T02:18:35ZengNature PortfolioScientific Reports2045-23222024-11-0114111610.1038/s41598-024-77107-0Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathwaysTuo Jiang0Cong Shen1Pingjian Ding2Lingyun Luo3School of Computer Science, University of South ChinaDepartment of Mathematics, National University of SingaporeSchool of Computer Science, University of South ChinaSchool of Computer Science, University of South ChinaAbstract RNA methylation modification influences various processes in the human body and has gained increasing attention from scholars. Predicting genes associated with RNA methylation pathways can significantly aid biologists in studying RNA methylation processes. Several prediction methods have been investigated, but their performance is still limited by the scarcity of positive samples. To address the challenge of data imbalance in RNA methylation-associated gene prediction tasks, this study employed a generative adversarial network to learn the feature distribution of the original dataset. The quality of synthetic samples was controlled using the Classifier Two-Sample Test (CTST). These synthetic samples were then added to the data blocks to mitigate class distribution imbalance. Experimental results demonstrated that integrating the synthetic samples generated by our proposed model with the original data enhances the prediction performance of various classifiers, outperforming other oversampling methods. Moreover, gene ontology (GO) enrichment analyses further demonstrate the effectiveness of the predicted genes associated with RNA methylation pathways. The model generating gene samples with PyTorch is available at https://github.com/heyheyheyheyhey1/WGAN-GP_RNA_methylationhttps://doi.org/10.1038/s41598-024-77107-0RNA methylationPathwaysMachine learningGenerative adversarial nets
spellingShingle Tuo Jiang
Cong Shen
Pingjian Ding
Lingyun Luo
Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways
Scientific Reports
RNA methylation
Pathways
Machine learning
Generative adversarial nets
title Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways
title_full Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways
title_fullStr Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways
title_full_unstemmed Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways
title_short Data augmentation based on the WGAN-GP with data block to enhance the prediction of genes associated with RNA methylation pathways
title_sort data augmentation based on the wgan gp with data block to enhance the prediction of genes associated with rna methylation pathways
topic RNA methylation
Pathways
Machine learning
Generative adversarial nets
url https://doi.org/10.1038/s41598-024-77107-0
work_keys_str_mv AT tuojiang dataaugmentationbasedonthewgangpwithdatablocktoenhancethepredictionofgenesassociatedwithrnamethylationpathways
AT congshen dataaugmentationbasedonthewgangpwithdatablocktoenhancethepredictionofgenesassociatedwithrnamethylationpathways
AT pingjianding dataaugmentationbasedonthewgangpwithdatablocktoenhancethepredictionofgenesassociatedwithrnamethylationpathways
AT lingyunluo dataaugmentationbasedonthewgangpwithdatablocktoenhancethepredictionofgenesassociatedwithrnamethylationpathways