Redefining the high variable genes by optimized LOESS regression with positive ratio
Abstract Background Single-cell RNA sequencing allows for the exploration of transcriptomic features at the individual cell level, but the high dimensionality and sparsity of the data pose substantial challenges for downstream analysis. Feature selection, therefore, is a critical step to reduce dime...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-04-01
|
| Series: | BMC Bioinformatics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12859-025-06112-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849699134356848640 |
|---|---|
| author | Yue Xie Zehua Jing Hailin Pan Xun Xu Qi Fang |
| author_facet | Yue Xie Zehua Jing Hailin Pan Xun Xu Qi Fang |
| author_sort | Yue Xie |
| collection | DOAJ |
| description | Abstract Background Single-cell RNA sequencing allows for the exploration of transcriptomic features at the individual cell level, but the high dimensionality and sparsity of the data pose substantial challenges for downstream analysis. Feature selection, therefore, is a critical step to reduce dimensionality and enhance interpretability. Results We developed a robust feature selection algorithm that leverages optimized locally estimated scatterplot smoothing regression (LOESS) to precisely capture the relationship between gene average expression level and positive ratio while minimizing overfitting. Our evaluations showed that our algorithm consistently outperforms eight leading feature selection methods across three benchmark criteria and helps improve downstream analysis, thus offering a significant improvement in gene subset selection. Conclusions By preserving key biological information through feature selection, GLP provides informative features to enhance the accuracy and effectiveness of downstream analyses. |
| format | Article |
| id | doaj-art-61bdabffbe8a4a078a908e5acfe0772f |
| institution | DOAJ |
| issn | 1471-2105 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Bioinformatics |
| spelling | doaj-art-61bdabffbe8a4a078a908e5acfe0772f2025-08-20T03:18:42ZengBMCBMC Bioinformatics1471-21052025-04-0126111710.1186/s12859-025-06112-5Redefining the high variable genes by optimized LOESS regression with positive ratioYue Xie0Zehua Jing1Hailin Pan2Xun Xu3Qi Fang4College of Life Sciences, University of Chinese Academy of SciencesCollege of Life Sciences, University of Chinese Academy of SciencesBGI ResearchCollege of Life Sciences, University of Chinese Academy of SciencesBGI ResearchAbstract Background Single-cell RNA sequencing allows for the exploration of transcriptomic features at the individual cell level, but the high dimensionality and sparsity of the data pose substantial challenges for downstream analysis. Feature selection, therefore, is a critical step to reduce dimensionality and enhance interpretability. Results We developed a robust feature selection algorithm that leverages optimized locally estimated scatterplot smoothing regression (LOESS) to precisely capture the relationship between gene average expression level and positive ratio while minimizing overfitting. Our evaluations showed that our algorithm consistently outperforms eight leading feature selection methods across three benchmark criteria and helps improve downstream analysis, thus offering a significant improvement in gene subset selection. Conclusions By preserving key biological information through feature selection, GLP provides informative features to enhance the accuracy and effectiveness of downstream analyses.https://doi.org/10.1186/s12859-025-06112-5Single cell transcriptomeHigh variable genesFeature selection |
| spellingShingle | Yue Xie Zehua Jing Hailin Pan Xun Xu Qi Fang Redefining the high variable genes by optimized LOESS regression with positive ratio BMC Bioinformatics Single cell transcriptome High variable genes Feature selection |
| title | Redefining the high variable genes by optimized LOESS regression with positive ratio |
| title_full | Redefining the high variable genes by optimized LOESS regression with positive ratio |
| title_fullStr | Redefining the high variable genes by optimized LOESS regression with positive ratio |
| title_full_unstemmed | Redefining the high variable genes by optimized LOESS regression with positive ratio |
| title_short | Redefining the high variable genes by optimized LOESS regression with positive ratio |
| title_sort | redefining the high variable genes by optimized loess regression with positive ratio |
| topic | Single cell transcriptome High variable genes Feature selection |
| url | https://doi.org/10.1186/s12859-025-06112-5 |
| work_keys_str_mv | AT yuexie redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio AT zehuajing redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio AT hailinpan redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio AT xunxu redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio AT qifang redefiningthehighvariablegenesbyoptimizedloessregressionwithpositiveratio |