Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data

Cancer gene expression data is extensively utilized to address the challenges of cancer subtype diagnosis. However, this data is often characterized by high-dimensional, multi-text and multi-classification, which requires an effective feature selection (FS) method. A multi-strategy fusion binary sea...

Full description

Saved in:
Bibliographic Details
Main Authors: Yu-Cai Wang, Hao-Ming Song, Jie-Sheng Wang, Xin-Ru Ma, Yu-Wei Song, Yu-Liang Qi
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:Egyptian Informatics Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110866525000325
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850073683159154688
author Yu-Cai Wang
Hao-Ming Song
Jie-Sheng Wang
Xin-Ru Ma
Yu-Wei Song
Yu-Liang Qi
author_facet Yu-Cai Wang
Hao-Ming Song
Jie-Sheng Wang
Xin-Ru Ma
Yu-Wei Song
Yu-Liang Qi
author_sort Yu-Cai Wang
collection DOAJ
description Cancer gene expression data is extensively utilized to address the challenges of cancer subtype diagnosis. However, this data is often characterized by high-dimensional, multi-text and multi-classification, which requires an effective feature selection (FS) method. A multi-strategy fusion binary sea-horse optimizer guided by Pearson correlation coefficient was proposed for FS with cancer gene expression data. For the multi-strategy fusion, the rest strategy is introduced in the sea-horse motor behavior stage. Subsequently, a search strategy based on symbiotic organisms of sea horses is designed for the predation stage. Finally, the elementary function dynamic weight strategy is proposed. Multi-strategy fusion enables the sea-horse optimizer (SHO) to perform dynamic exploitation and exploration in the early stage of iteration, expand the search scope initially, and narrow the search scope in the middle and later stages of the algorithm, so as to avoid the algorithm falling into the local optimal and increase the possibility of the algorithm jumping out of the local optimal, and avoid the blind search caused by elite influence. In the FS part, Pearson correlation coefficient guided strategy is proposed firstly to add or delete features. Then eight binary algorithms are derived from S-type and V-type transfer functions. The simulation experiment was divided into four parts. Firstly, the CEC-2022 test functions were used to test the performance of the multi-strategy fusion SHO, from which the best variant TanASSHO was selected, and then compared with other nine swarm intelligent optimization algorithms. Performance tests of various algorithm variants on 18 UCI datasets show that V1PTASSHO is the most effective binary version. Finally, V1PTASSHO was compared with other nine swarm intelligent optimization algorithms on 18 cancer gene expression datasets. The results demonstrate that V1PTASSHO effectively reduces feature subsets, improve classification accuracy and obtain lower fitness value. Friedman test and Wilcoxon rank sum test were used for statistical analysis to verify the effectiveness of the proposed algorithm.
format Article
id doaj-art-4b7eb46172d74e2e98e9ff30b7ecbb1a
institution DOAJ
issn 1110-8665
language English
publishDate 2025-03-01
publisher Elsevier
record_format Article
series Egyptian Informatics Journal
spelling doaj-art-4b7eb46172d74e2e98e9ff30b7ecbb1a2025-08-20T02:46:46ZengElsevierEgyptian Informatics Journal1110-86652025-03-012910063910.1016/j.eij.2025.100639Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression dataYu-Cai Wang0Hao-Ming Song1Jie-Sheng Wang2Xin-Ru Ma3Yu-Wei Song4Yu-Liang Qi5School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaSchool of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaCorresponding author.; School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaSchool of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaSchool of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaSchool of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan City, Liaoning Province, PR ChinaCancer gene expression data is extensively utilized to address the challenges of cancer subtype diagnosis. However, this data is often characterized by high-dimensional, multi-text and multi-classification, which requires an effective feature selection (FS) method. A multi-strategy fusion binary sea-horse optimizer guided by Pearson correlation coefficient was proposed for FS with cancer gene expression data. For the multi-strategy fusion, the rest strategy is introduced in the sea-horse motor behavior stage. Subsequently, a search strategy based on symbiotic organisms of sea horses is designed for the predation stage. Finally, the elementary function dynamic weight strategy is proposed. Multi-strategy fusion enables the sea-horse optimizer (SHO) to perform dynamic exploitation and exploration in the early stage of iteration, expand the search scope initially, and narrow the search scope in the middle and later stages of the algorithm, so as to avoid the algorithm falling into the local optimal and increase the possibility of the algorithm jumping out of the local optimal, and avoid the blind search caused by elite influence. In the FS part, Pearson correlation coefficient guided strategy is proposed firstly to add or delete features. Then eight binary algorithms are derived from S-type and V-type transfer functions. The simulation experiment was divided into four parts. Firstly, the CEC-2022 test functions were used to test the performance of the multi-strategy fusion SHO, from which the best variant TanASSHO was selected, and then compared with other nine swarm intelligent optimization algorithms. Performance tests of various algorithm variants on 18 UCI datasets show that V1PTASSHO is the most effective binary version. Finally, V1PTASSHO was compared with other nine swarm intelligent optimization algorithms on 18 cancer gene expression datasets. The results demonstrate that V1PTASSHO effectively reduces feature subsets, improve classification accuracy and obtain lower fitness value. Friedman test and Wilcoxon rank sum test were used for statistical analysis to verify the effectiveness of the proposed algorithm.http://www.sciencedirect.com/science/article/pii/S1110866525000325Feature selectionCancer gene expressionSea-horse optimizerMulti-strategy fusionPearson correlation coefficient
spellingShingle Yu-Cai Wang
Hao-Ming Song
Jie-Sheng Wang
Xin-Ru Ma
Yu-Wei Song
Yu-Liang Qi
Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data
Egyptian Informatics Journal
Feature selection
Cancer gene expression
Sea-horse optimizer
Multi-strategy fusion
Pearson correlation coefficient
title Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data
title_full Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data
title_fullStr Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data
title_full_unstemmed Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data
title_short Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data
title_sort multi strategy fusion binary sho guided by pearson correlation coefficient for feature selection with cancer gene expression data
topic Feature selection
Cancer gene expression
Sea-horse optimizer
Multi-strategy fusion
Pearson correlation coefficient
url http://www.sciencedirect.com/science/article/pii/S1110866525000325
work_keys_str_mv AT yucaiwang multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata
AT haomingsong multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata
AT jieshengwang multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata
AT xinruma multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata
AT yuweisong multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata
AT yuliangqi multistrategyfusionbinaryshoguidedbypearsoncorrelationcoefficientforfeatureselectionwithcancergeneexpressiondata