Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data
Cancer gene expression data is extensively utilized to address the challenges of cancer subtype diagnosis. However, this data is often characterized by high-dimensional, multi-text and multi-classification, which requires an effective feature selection (FS) method. A multi-strategy fusion binary sea...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-03-01
|
| Series: | Egyptian Informatics Journal |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S1110866525000325 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850073683159154688 |
|---|---|
| author | Yu-Cai Wang Hao-Ming Song Jie-Sheng Wang Xin-Ru Ma Yu-Wei Song Yu-Liang Qi |
| author_facet | Yu-Cai Wang Hao-Ming Song Jie-Sheng Wang Xin-Ru Ma Yu-Wei Song Yu-Liang Qi |
| author_sort | Yu-Cai Wang |
| collection | DOAJ |
| description | Cancer gene expression data is extensively utilized to address the challenges of cancer subtype diagnosis. However, this data is often characterized by high-dimensional, multi-text and multi-classification, which requires an effective feature selection (FS) method. A multi-strategy fusion binary sea-horse optimizer guided by Pearson correlation coefficient was proposed for FS with cancer gene expression data. For the multi-strategy fusion, the rest strategy is introduced in the sea-horse motor behavior stage. Subsequently, a search strategy based on symbiotic organisms of sea horses is designed for the predation stage. Finally, the elementary function dynamic weight strategy is proposed. Multi-strategy fusion enables the sea-horse optimizer (SHO) to perform dynamic exploitation and exploration in the early stage of iteration, expand the search scope initially, and narrow the search scope in the middle and later stages of the algorithm, so as to avoid the algorithm falling into the local optimal and increase the possibility of the algorithm jumping out of the local optimal, and avoid the blind search caused by elite influence. In the FS part, Pearson correlation coefficient guided strategy is proposed firstly to add or delete features. Then eight binary algorithms are derived from S-type and V-type transfer functions. The simulation experiment was divided into four parts. Firstly, the CEC-2022 test functions were used to test the performance of the multi-strategy fusion SHO, from which the best variant TanASSHO was selected, and then compared with other nine swarm intelligent optimization algorithms. Performance tests of various algorithm variants on 18 UCI datasets show that V1PTASSHO is the most effective binary version. Finally, V1PTASSHO was compared with other nine swarm intelligent optimization algorithms on 18 cancer gene expression datasets. The results demonstrate that V1PTASSHO effectively reduces feature subsets, improve classification accuracy and obtain lower fitness value. Friedman test and Wilcoxon rank sum test were used for statistical analysis to verify the effectiveness of the proposed algorithm. |
| format | Article |
| id | doaj-art-4b7eb46172d74e2e98e9ff30b7ecbb1a |
| institution | DOAJ |
| issn | 1110-8665 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Egyptian Informatics Journal |
| spelling | doaj-art-4b7eb46172d74e2e98e9ff30b7ecbb1a2025-08-20T02:46:46ZengElsevierEgyptian Informatics Journal1110-86652025-03-012910063910.1016/j.eij.2025.100639Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression dataYu-Cai Wang0Hao-Ming Song1Jie-Sheng Wang2Xin-Ru Ma3Yu-Wei Song4Yu-Liang Qi5School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaSchool of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaCorresponding author.; School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaSchool of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaSchool of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaSchool of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaCancer gene expression data is extensively utilized to address the challenges of cancer subtype diagnosis. However, this data is often characterized by high-dimensional, multi-text and multi-classification, which requires an effective feature selection (FS) method. A multi-strategy fusion binary sea-horse optimizer guided by Pearson correlation coefficient was proposed for FS with cancer gene expression data. For the multi-strategy fusion, the rest strategy is introduced in the sea-horse motor behavior stage. Subsequently, a search strategy based on symbiotic organisms of sea horses is designed for the predation stage. Finally, the elementary function dynamic weight strategy is proposed. Multi-strategy fusion enables the sea-horse optimizer (SHO) to perform dynamic exploitation and exploration in the early stage of iteration, expand the search scope initially, and narrow the search scope in the middle and later stages of the algorithm, so as to avoid the algorithm falling into the local optimal and increase the possibility of the algorithm jumping out of the local optimal, and avoid the blind search caused by elite influence. In the FS part, Pearson correlation coefficient guided strategy is proposed firstly to add or delete features. Then eight binary algorithms are derived from S-type and V-type transfer functions. The simulation experiment was divided into four parts. Firstly, the CEC-2022 test functions were used to test the performance of the multi-strategy fusion SHO, from which the best variant TanASSHO was selected, and then compared with other nine swarm intelligent optimization algorithms. Performance tests of various algorithm variants on 18 UCI datasets show that V1PTASSHO is the most effective binary version. Finally, V1PTASSHO was compared with other nine swarm intelligent optimization algorithms on 18 cancer gene expression datasets. The results demonstrate that V1PTASSHO effectively reduces feature subsets, improve classification accuracy and obtain lower fitness value. Friedman test and Wilcoxon rank sum test were used for statistical analysis to verify the effectiveness of the proposed algorithm.http://www.sciencedirect.com/science/article/pii/S1110866525000325Feature selectionCancer gene expressionSea-horse optimizerMulti-strategy fusionPearson correlation coefficient |
| spellingShingle | Yu-Cai Wang Hao-Ming Song Jie-Sheng Wang Xin-Ru Ma Yu-Wei Song Yu-Liang Qi Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data Egyptian Informatics Journal Feature selection Cancer gene expression Sea-horse optimizer Multi-strategy fusion Pearson correlation coefficient |
| title | Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data |
| title_full | Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data |
| title_fullStr | Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data |
| title_full_unstemmed | Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data |
| title_short | Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data |
| title_sort | multi strategy fusion binary sho guided by pearson correlation coefficient for feature selection with cancer gene expression data |
| topic | Feature selection Cancer gene expression Sea-horse optimizer Multi-strategy fusion Pearson correlation coefficient |
| url | http://www.sciencedirect.com/science/article/pii/S1110866525000325 |
| work_keys_str_mv | AT yucaiwang multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata AT haomingsong multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata AT jieshengwang multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata AT xinruma multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata AT yuweisong multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata AT yuliangqi multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata |