Redefining the high variable genes by optimized LOESS regression with positive ratio

Abstract Background Single-cell RNA sequencing allows for the exploration of transcriptomic features at the individual cell level, but the high dimensionality and sparsity of the data pose substantial challenges for downstream analysis. Feature selection, therefore, is a critical step to reduce dime...

Full description

Saved in:
Bibliographic Details
Main Authors: Yue Xie, Zehua Jing, Hailin Pan, Xun Xu, Qi Fang
Format: Article
Language:English
Published: BMC 2025-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06112-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849699134356848640
author Yue Xie
Zehua Jing
Hailin Pan
Xun Xu
Qi Fang
author_facet Yue Xie
Zehua Jing
Hailin Pan
Xun Xu
Qi Fang
author_sort Yue Xie
collection DOAJ
description Abstract Background Single-cell RNA sequencing allows for the exploration of transcriptomic features at the individual cell level, but the high dimensionality and sparsity of the data pose substantial challenges for downstream analysis. Feature selection, therefore, is a critical step to reduce dimensionality and enhance interpretability. Results We developed a robust feature selection algorithm that leverages optimized locally estimated scatterplot smoothing regression (LOESS) to precisely capture the relationship between gene average expression level and positive ratio while minimizing overfitting. Our evaluations showed that our algorithm consistently outperforms eight leading feature selection methods across three benchmark criteria and helps improve downstream analysis, thus offering a significant improvement in gene subset selection. Conclusions By preserving key biological information through feature selection, GLP provides informative features to enhance the accuracy and effectiveness of downstream analyses.
format Article
id doaj-art-61bdabffbe8a4a078a908e5acfe0772f
institution DOAJ
issn 1471-2105
language English
publishDate 2025-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-61bdabffbe8a4a078a908e5acfe0772f2025-08-20T03:18:42ZengBMCBMC Bioinformatics1471-21052025-04-0126111710.1186/s12859-025-06112-5Redefining the high variable genes by optimized LOESS regression with positive ratioYue Xie0Zehua Jing1Hailin Pan2Xun Xu3Qi Fang4College of Life Sciences, University of Chinese Academy of SciencesCollege of Life Sciences, University of Chinese Academy of SciencesBGI ResearchCollege of Life Sciences, University of Chinese Academy of SciencesBGI ResearchAbstract Background Single-cell RNA sequencing allows for the exploration of transcriptomic features at the individual cell level, but the high dimensionality and sparsity of the data pose substantial challenges for downstream analysis. Feature selection, therefore, is a critical step to reduce dimensionality and enhance interpretability. Results We developed a robust feature selection algorithm that leverages optimized locally estimated scatterplot smoothing regression (LOESS) to precisely capture the relationship between gene average expression level and positive ratio while minimizing overfitting. Our evaluations showed that our algorithm consistently outperforms eight leading feature selection methods across three benchmark criteria and helps improve downstream analysis, thus offering a significant improvement in gene subset selection. Conclusions By preserving key biological information through feature selection, GLP provides informative features to enhance the accuracy and effectiveness of downstream analyses.https://doi.org/10.1186/s12859-025-06112-5Single cell transcriptomeHigh variable genesFeature selection
spellingShingle Yue Xie
Zehua Jing
Hailin Pan
Xun Xu
Qi Fang
Redefining the high variable genes by optimized LOESS regression with positive ratio
BMC Bioinformatics
Single cell transcriptome
High variable genes
Feature selection
title Redefining the high variable genes by optimized LOESS regression with positive ratio
title_full Redefining the high variable genes by optimized LOESS regression with positive ratio
title_fullStr Redefining the high variable genes by optimized LOESS regression with positive ratio
title_full_unstemmed Redefining the high variable genes by optimized LOESS regression with positive ratio
title_short Redefining the high variable genes by optimized LOESS regression with positive ratio
title_sort redefining the high variable genes by optimized loess regression with positive ratio
topic Single cell transcriptome
High variable genes
Feature selection
url https://doi.org/10.1186/s12859-025-06112-5
work_keys_str_mv AT yuexie redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio
AT zehuajing redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio
AT hailinpan redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio
AT xunxu redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio
AT qifang redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio