LASSO-mCGA: Machine Learning and Modified Compact Genetic Algorithm-Based Biomarker Selection for Breast Cancer Subtype Classification

Breast cancer is the most common cancer type among females and is one of the leading causes of death worldwide. Being a heterogeneous disease, subtyping breast cancer plays a vital role in its treatment. In this regard, gene expression plays an important role. Thus, in this work gene expression data...

Full description

Saved in:
Bibliographic Details
Main Authors: Nimisha Ghosh, Sankar Kumar Mridha, Rourab Paul
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10848104/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832576758616948736
author Nimisha Ghosh
Sankar Kumar Mridha
Rourab Paul
author_facet Nimisha Ghosh
Sankar Kumar Mridha
Rourab Paul
author_sort Nimisha Ghosh
collection DOAJ
description Breast cancer is the most common cancer type among females and is one of the leading causes of death worldwide. Being a heterogeneous disease, subtyping breast cancer plays a vital role in its treatment. In this regard, gene expression plays an important role. Thus, in this work gene expression data is used to identify the most significant gene biomarkers. The identified biomarkers are highly associated with each breast cancer subtype such as Luminal A, Luminal B, HER2-Enriched and Basal-Like. To identify such biomarkers, initially LASSO in association with four machine learning models such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbours (KNN) and Naive Bayes (NB) are applied on the dataset to find the initial reduced set of genes as well as the best learning model based on classification accuracy; SVM in this case. Thereafter, Modified Compact Genetic Algorithm (mCGA) is performed to identify the final set of genes as biomarkers for each specific subtype. Experimental results suggest that our proposed method assesses AUC-ROC values of 0.9878 and 0.97311 for LumA and LumB and 1 for Basal and HER2 subtypes. To validate the biological significance of the identified biomarkers, KEGG pathway and GO enrichment analysis are carried out.
format Article
id doaj-art-5e557918f468452db4b369487cfd6795
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-5e557918f468452db4b369487cfd67952025-01-31T00:01:23ZengIEEEIEEE Access2169-35362025-01-0113176731768210.1109/ACCESS.2025.353236110848104LASSO-mCGA: Machine Learning and Modified Compact Genetic Algorithm-Based Biomarker Selection for Breast Cancer Subtype ClassificationNimisha Ghosh0Sankar Kumar Mridha1Rourab Paul2https://orcid.org/0000-0001-5322-281XDepartment of Computer Science and Engineering, Centre of Internet of Things, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, IndiaDepartment of Computer Science and Information Technology, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, IndiaDepartment of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, IndiaBreast cancer is the most common cancer type among females and is one of the leading causes of death worldwide. Being a heterogeneous disease, subtyping breast cancer plays a vital role in its treatment. In this regard, gene expression plays an important role. Thus, in this work gene expression data is used to identify the most significant gene biomarkers. The identified biomarkers are highly associated with each breast cancer subtype such as Luminal A, Luminal B, HER2-Enriched and Basal-Like. To identify such biomarkers, initially LASSO in association with four machine learning models such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbours (KNN) and Naive Bayes (NB) are applied on the dataset to find the initial reduced set of genes as well as the best learning model based on classification accuracy; SVM in this case. Thereafter, Modified Compact Genetic Algorithm (mCGA) is performed to identify the final set of genes as biomarkers for each specific subtype. Experimental results suggest that our proposed method assesses AUC-ROC values of 0.9878 and 0.97311 for LumA and LumB and 1 for Basal and HER2 subtypes. To validate the biological significance of the identified biomarkers, KEGG pathway and GO enrichment analysis are carried out.https://ieeexplore.ieee.org/document/10848104/Biomarkerbreast cancerfeature selectionLASSOmodified compact genetic algorithm
spellingShingle Nimisha Ghosh
Sankar Kumar Mridha
Rourab Paul
LASSO-mCGA: Machine Learning and Modified Compact Genetic Algorithm-Based Biomarker Selection for Breast Cancer Subtype Classification
IEEE Access
Biomarker
breast cancer
feature selection
LASSO
modified compact genetic algorithm
title LASSO-mCGA: Machine Learning and Modified Compact Genetic Algorithm-Based Biomarker Selection for Breast Cancer Subtype Classification
title_full LASSO-mCGA: Machine Learning and Modified Compact Genetic Algorithm-Based Biomarker Selection for Breast Cancer Subtype Classification
title_fullStr LASSO-mCGA: Machine Learning and Modified Compact Genetic Algorithm-Based Biomarker Selection for Breast Cancer Subtype Classification
title_full_unstemmed LASSO-mCGA: Machine Learning and Modified Compact Genetic Algorithm-Based Biomarker Selection for Breast Cancer Subtype Classification
title_short LASSO-mCGA: Machine Learning and Modified Compact Genetic Algorithm-Based Biomarker Selection for Breast Cancer Subtype Classification
title_sort lasso mcga machine learning and modified compact genetic algorithm based biomarker selection for breast cancer subtype classification
topic Biomarker
breast cancer
feature selection
LASSO
modified compact genetic algorithm
url https://ieeexplore.ieee.org/document/10848104/
work_keys_str_mv AT nimishaghosh lassomcgamachinelearningandmodifiedcompactgeneticalgorithmbasedbiomarkerselectionforbreastcancersubtypeclassification
AT sankarkumarmridha lassomcgamachinelearningandmodifiedcompactgeneticalgorithmbasedbiomarkerselectionforbreastcancersubtypeclassification
AT rourabpaul lassomcgamachinelearningandmodifiedcompactgeneticalgorithmbasedbiomarkerselectionforbreastcancersubtypeclassification