Predicting lncRNA and disease associations with graph autoencoder and noise robust gradient boosting

Abstract lncRNAs are densely related to many human diseases. Identifying new lncRNA-disease associations (LDAs) conduces to better deciphering mechanisms of diseases, finding new biomarkers, and further promoting their diagnosis and treatment. In this manuscript, we devise an LDA prediction framewor...

Full description

Saved in:
Bibliographic Details
Main Authors: Lili Tang, Liangliang Huang, Yi Yuan
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-03269-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235391394086912
author Lili Tang
Liangliang Huang
Yi Yuan
author_facet Lili Tang
Liangliang Huang
Yi Yuan
author_sort Lili Tang
collection DOAJ
description Abstract lncRNAs are densely related to many human diseases. Identifying new lncRNA-disease associations (LDAs) conduces to better deciphering mechanisms of diseases, finding new biomarkers, and further promoting their diagnosis and treatment. In this manuscript, we devise an LDA prediction framework called LDA-GARB. LDA-GARB first combines nonnegative matrix factorization to extract linear features of lncRNAs and diseases. Next, it computes lncRNA similarity and disease similarity and adopts a graph autoencoder to extract nonlinear features of lncRNAs and diseases. Subsequently, the extracted features are concatenated as a vector. Finally, it takes the obtained vector as inputs and designs a noise-robust gradient boosting model to uncover potential associations from unknown lncRNA-disease pairs. To investigate the LDA-GARB performance, we used precision, recall, accuracy, F1-score, AUC, and AUPR as measurement metrics and performed multiple comparison experiments. First, it was benchmarked with four representative LDA prediction methods, i.e., SDLDA, LDNFSGB, LDAenDL, and LDA-VGHB, under 5-fold cross validations on lncRNAs, diseases, and lncRNA-disease pairs. Next, it was compared with four representative boosting models, i.e., XGBoost, AdaBoost, CatBoost, and LightGBM, under the above three different cross validations. Subsequently, the performance of LDA-GARB against LDA-LNSUBRW, GAMCLDA, LDA-VGHB, LDAGM, and GANLDA on imbalanced data was evaluated. We also performed parameter sensitivity analysis and ablation experiments. The results demonstrated that LDA-GARB improved LDA prediction. Finally, LDA-GARB was applied to predict potential associated lncRNAs for colorectal cancer and breast cancer. CCDC26 and HAR1A have been inferred to have an association with the two cancers, respectively. As a useful LDA identification tool, LDA-GARB is freely available at https://github.com/smiling199/LDA-GARB .
format Article
id doaj-art-a42d61df3d5e4cebb057941ba442ff5d
institution Kabale University
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-a42d61df3d5e4cebb057941ba442ff5d2025-08-20T04:02:46ZengNature PortfolioScientific Reports2045-23222025-05-0115112310.1038/s41598-025-03269-0Predicting lncRNA and disease associations with graph autoencoder and noise robust gradient boostingLili Tang0Liangliang Huang1Yi Yuan2School of Computer Science, Hunan University of TechnologySchool of Information Technology and Administration, Hunan University of Finance and EconomicsSchool of Computer Science, Hunan University of TechnologyAbstract lncRNAs are densely related to many human diseases. Identifying new lncRNA-disease associations (LDAs) conduces to better deciphering mechanisms of diseases, finding new biomarkers, and further promoting their diagnosis and treatment. In this manuscript, we devise an LDA prediction framework called LDA-GARB. LDA-GARB first combines nonnegative matrix factorization to extract linear features of lncRNAs and diseases. Next, it computes lncRNA similarity and disease similarity and adopts a graph autoencoder to extract nonlinear features of lncRNAs and diseases. Subsequently, the extracted features are concatenated as a vector. Finally, it takes the obtained vector as inputs and designs a noise-robust gradient boosting model to uncover potential associations from unknown lncRNA-disease pairs. To investigate the LDA-GARB performance, we used precision, recall, accuracy, F1-score, AUC, and AUPR as measurement metrics and performed multiple comparison experiments. First, it was benchmarked with four representative LDA prediction methods, i.e., SDLDA, LDNFSGB, LDAenDL, and LDA-VGHB, under 5-fold cross validations on lncRNAs, diseases, and lncRNA-disease pairs. Next, it was compared with four representative boosting models, i.e., XGBoost, AdaBoost, CatBoost, and LightGBM, under the above three different cross validations. Subsequently, the performance of LDA-GARB against LDA-LNSUBRW, GAMCLDA, LDA-VGHB, LDAGM, and GANLDA on imbalanced data was evaluated. We also performed parameter sensitivity analysis and ablation experiments. The results demonstrated that LDA-GARB improved LDA prediction. Finally, LDA-GARB was applied to predict potential associated lncRNAs for colorectal cancer and breast cancer. CCDC26 and HAR1A have been inferred to have an association with the two cancers, respectively. As a useful LDA identification tool, LDA-GARB is freely available at https://github.com/smiling199/LDA-GARB .https://doi.org/10.1038/s41598-025-03269-0
spellingShingle Lili Tang
Liangliang Huang
Yi Yuan
Predicting lncRNA and disease associations with graph autoencoder and noise robust gradient boosting
Scientific Reports
title Predicting lncRNA and disease associations with graph autoencoder and noise robust gradient boosting
title_full Predicting lncRNA and disease associations with graph autoencoder and noise robust gradient boosting
title_fullStr Predicting lncRNA and disease associations with graph autoencoder and noise robust gradient boosting
title_full_unstemmed Predicting lncRNA and disease associations with graph autoencoder and noise robust gradient boosting
title_short Predicting lncRNA and disease associations with graph autoencoder and noise robust gradient boosting
title_sort predicting lncrna and disease associations with graph autoencoder and noise robust gradient boosting
url https://doi.org/10.1038/s41598-025-03269-0
work_keys_str_mv AT lilitang predictinglncrnaanddiseaseassociationswithgraphautoencoderandnoiserobustgradientboosting
AT lianglianghuang predictinglncrnaanddiseaseassociationswithgraphautoencoderandnoiserobustgradientboosting
AT yiyuan predictinglncrnaanddiseaseassociationswithgraphautoencoderandnoiserobustgradientboosting