High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier

Background and objectiveGene expression analysis plays a critical role in lung cancer research, offering molecular feature-based diagnostic insights that are particularly effective in distinguishing lung cancer subtypes. However, the high dimensionality and inherent imbalance of gene expression data...

Full description

Saved in:
Bibliographic Details
Main Authors: Siyu Zhan, Hao Yu, Shuang Liu, Ke Qin, Lu Guo
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-04-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2025.1583081/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849310693740773376
author Siyu Zhan
Siyu Zhan
Hao Yu
Shuang Liu
Ke Qin
Lu Guo
author_facet Siyu Zhan
Siyu Zhan
Hao Yu
Shuang Liu
Ke Qin
Lu Guo
author_sort Siyu Zhan
collection DOAJ
description Background and objectiveGene expression analysis plays a critical role in lung cancer research, offering molecular feature-based diagnostic insights that are particularly effective in distinguishing lung cancer subtypes. However, the high dimensionality and inherent imbalance of gene expression data create significant challenges for accurate diagnosis. This study aims to address these challenges by proposing an innovative deep learning-based method for predicting lung cancer subtypes.MethodsWe propose a method called Exo-LCClassifier, which integrates feature selection, one-dimensional convolutional neural networks (1D CNN), and an improved Wasserstein Generative Adversarial Network (WGAN). First, differential gene expression analysis was performed using DESeq2 to identify significantly expressed genes from both normal and tumor tissues. Next, the enhanced WGAN was applied to augment the dataset, addressing the issue of sample imbalance and increasing the diversity of effective samples. Finally, a 1D CNN was used to classify the balanced dataset, thereby improving the model’s diagnostic accuracy.ResultsThe proposed method was evaluated using five-fold cross-validation, achieving an average accuracy of 0.9766 ± 0.0070, precision of 0.9762 ± 0.0101, recall of 0.9827 ± 0.0050, and F1-score of 0.9793 ± 0.0068. On an external GEO lung cancer dataset, it also showed strong performance with an accuracy of 0.9588, precision of 0.9558, recall of 0.9678, and F1-score of 0.9616.ConclusionThis study addresses the critical challenge of imbalanced learning in lung cancer gene expression analysis through an innovative computational framework. Our solution integrates three advanced techniques: (1) DESeq2 for differential expression analysis, (2) WGAN for data augmentation, and (3) 1D CNN for feature learning and classification. The source codes are publicly available at: https://github.com/lanlinxxs/Exo-classifier.
format Article
id doaj-art-dbde509b703641908eff399c2ea4ae9b
institution Kabale University
issn 1664-8021
language English
publishDate 2025-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj-art-dbde509b703641908eff399c2ea4ae9b2025-08-20T03:53:39ZengFrontiers Media S.A.Frontiers in Genetics1664-80212025-04-011610.3389/fgene.2025.15830811583081High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifierSiyu Zhan0Siyu Zhan1Hao Yu2Shuang Liu3Ke Qin4Lu Guo5Institute of Intelligent Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, ChinaTrusted Cloud Computing and Big Data Key Laboratory of Sichuan Province, Chengdu, Sichuan, ChinaSchool of Optoelectronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaYingcai Experimental College, University of Electronic Science and Technology of China, Chengdu, ChinaInstitute of Intelligent Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, ChinaDepartment of Pulmonary and Critical Care Medicine, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, ChinaBackground and objectiveGene expression analysis plays a critical role in lung cancer research, offering molecular feature-based diagnostic insights that are particularly effective in distinguishing lung cancer subtypes. However, the high dimensionality and inherent imbalance of gene expression data create significant challenges for accurate diagnosis. This study aims to address these challenges by proposing an innovative deep learning-based method for predicting lung cancer subtypes.MethodsWe propose a method called Exo-LCClassifier, which integrates feature selection, one-dimensional convolutional neural networks (1D CNN), and an improved Wasserstein Generative Adversarial Network (WGAN). First, differential gene expression analysis was performed using DESeq2 to identify significantly expressed genes from both normal and tumor tissues. Next, the enhanced WGAN was applied to augment the dataset, addressing the issue of sample imbalance and increasing the diversity of effective samples. Finally, a 1D CNN was used to classify the balanced dataset, thereby improving the model’s diagnostic accuracy.ResultsThe proposed method was evaluated using five-fold cross-validation, achieving an average accuracy of 0.9766 ± 0.0070, precision of 0.9762 ± 0.0101, recall of 0.9827 ± 0.0050, and F1-score of 0.9793 ± 0.0068. On an external GEO lung cancer dataset, it also showed strong performance with an accuracy of 0.9588, precision of 0.9558, recall of 0.9678, and F1-score of 0.9616.ConclusionThis study addresses the critical challenge of imbalanced learning in lung cancer gene expression analysis through an innovative computational framework. Our solution integrates three advanced techniques: (1) DESeq2 for differential expression analysis, (2) WGAN for data augmentation, and (3) 1D CNN for feature learning and classification. The source codes are publicly available at: https://github.com/lanlinxxs/Exo-classifier.https://www.frontiersin.org/articles/10.3389/fgene.2025.1583081/fulllung cancergene expressionWGANimbalanced dataDESeq21D CNN
spellingShingle Siyu Zhan
Siyu Zhan
Hao Yu
Shuang Liu
Ke Qin
Lu Guo
High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier
Frontiers in Genetics
lung cancer
gene expression
WGAN
imbalanced data
DESeq2
1D CNN
title High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier
title_full High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier
title_fullStr High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier
title_full_unstemmed High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier
title_short High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier
title_sort high precision lung cancer subtype diagnosis on imbalanced exosomal data via exo lcclassifier
topic lung cancer
gene expression
WGAN
imbalanced data
DESeq2
1D CNN
url https://www.frontiersin.org/articles/10.3389/fgene.2025.1583081/full
work_keys_str_mv AT siyuzhan highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier
AT siyuzhan highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier
AT haoyu highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier
AT shuangliu highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier
AT keqin highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier
AT luguo highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier