High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier
Background and objectiveGene expression analysis plays a critical role in lung cancer research, offering molecular feature-based diagnostic insights that are particularly effective in distinguishing lung cancer subtypes. However, the high dimensionality and inherent imbalance of gene expression data...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-04-01
|
| Series: | Frontiers in Genetics |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fgene.2025.1583081/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849310693740773376 |
|---|---|
| author | Siyu Zhan Siyu Zhan Hao Yu Shuang Liu Ke Qin Lu Guo |
| author_facet | Siyu Zhan Siyu Zhan Hao Yu Shuang Liu Ke Qin Lu Guo |
| author_sort | Siyu Zhan |
| collection | DOAJ |
| description | Background and objectiveGene expression analysis plays a critical role in lung cancer research, offering molecular feature-based diagnostic insights that are particularly effective in distinguishing lung cancer subtypes. However, the high dimensionality and inherent imbalance of gene expression data create significant challenges for accurate diagnosis. This study aims to address these challenges by proposing an innovative deep learning-based method for predicting lung cancer subtypes.MethodsWe propose a method called Exo-LCClassifier, which integrates feature selection, one-dimensional convolutional neural networks (1D CNN), and an improved Wasserstein Generative Adversarial Network (WGAN). First, differential gene expression analysis was performed using DESeq2 to identify significantly expressed genes from both normal and tumor tissues. Next, the enhanced WGAN was applied to augment the dataset, addressing the issue of sample imbalance and increasing the diversity of effective samples. Finally, a 1D CNN was used to classify the balanced dataset, thereby improving the model’s diagnostic accuracy.ResultsThe proposed method was evaluated using five-fold cross-validation, achieving an average accuracy of 0.9766 ± 0.0070, precision of 0.9762 ± 0.0101, recall of 0.9827 ± 0.0050, and F1-score of 0.9793 ± 0.0068. On an external GEO lung cancer dataset, it also showed strong performance with an accuracy of 0.9588, precision of 0.9558, recall of 0.9678, and F1-score of 0.9616.ConclusionThis study addresses the critical challenge of imbalanced learning in lung cancer gene expression analysis through an innovative computational framework. Our solution integrates three advanced techniques: (1) DESeq2 for differential expression analysis, (2) WGAN for data augmentation, and (3) 1D CNN for feature learning and classification. The source codes are publicly available at: https://github.com/lanlinxxs/Exo-classifier. |
| format | Article |
| id | doaj-art-dbde509b703641908eff399c2ea4ae9b |
| institution | Kabale University |
| issn | 1664-8021 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Genetics |
| spelling | doaj-art-dbde509b703641908eff399c2ea4ae9b2025-08-20T03:53:39ZengFrontiers Media S.A.Frontiers in Genetics1664-80212025-04-011610.3389/fgene.2025.15830811583081High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifierSiyu Zhan0Siyu Zhan1Hao Yu2Shuang Liu3Ke Qin4Lu Guo5Institute of Intelligent Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, ChinaTrusted Cloud Computing and Big Data Key Laboratory of Sichuan Province, Chengdu, Sichuan, ChinaSchool of Optoelectronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaYingcai Experimental College, University of Electronic Science and Technology of China, Chengdu, ChinaInstitute of Intelligent Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, ChinaDepartment of Pulmonary and Critical Care Medicine, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, ChinaBackground and objectiveGene expression analysis plays a critical role in lung cancer research, offering molecular feature-based diagnostic insights that are particularly effective in distinguishing lung cancer subtypes. However, the high dimensionality and inherent imbalance of gene expression data create significant challenges for accurate diagnosis. This study aims to address these challenges by proposing an innovative deep learning-based method for predicting lung cancer subtypes.MethodsWe propose a method called Exo-LCClassifier, which integrates feature selection, one-dimensional convolutional neural networks (1D CNN), and an improved Wasserstein Generative Adversarial Network (WGAN). First, differential gene expression analysis was performed using DESeq2 to identify significantly expressed genes from both normal and tumor tissues. Next, the enhanced WGAN was applied to augment the dataset, addressing the issue of sample imbalance and increasing the diversity of effective samples. Finally, a 1D CNN was used to classify the balanced dataset, thereby improving the model’s diagnostic accuracy.ResultsThe proposed method was evaluated using five-fold cross-validation, achieving an average accuracy of 0.9766 ± 0.0070, precision of 0.9762 ± 0.0101, recall of 0.9827 ± 0.0050, and F1-score of 0.9793 ± 0.0068. On an external GEO lung cancer dataset, it also showed strong performance with an accuracy of 0.9588, precision of 0.9558, recall of 0.9678, and F1-score of 0.9616.ConclusionThis study addresses the critical challenge of imbalanced learning in lung cancer gene expression analysis through an innovative computational framework. Our solution integrates three advanced techniques: (1) DESeq2 for differential expression analysis, (2) WGAN for data augmentation, and (3) 1D CNN for feature learning and classification. The source codes are publicly available at: https://github.com/lanlinxxs/Exo-classifier.https://www.frontiersin.org/articles/10.3389/fgene.2025.1583081/fulllung cancergene expressionWGANimbalanced dataDESeq21D CNN |
| spellingShingle | Siyu Zhan Siyu Zhan Hao Yu Shuang Liu Ke Qin Lu Guo High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier Frontiers in Genetics lung cancer gene expression WGAN imbalanced data DESeq2 1D CNN |
| title | High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier |
| title_full | High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier |
| title_fullStr | High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier |
| title_full_unstemmed | High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier |
| title_short | High-precision lung cancer subtype diagnosis on imbalanced exosomal data via Exo-LCClassifier |
| title_sort | high precision lung cancer subtype diagnosis on imbalanced exosomal data via exo lcclassifier |
| topic | lung cancer gene expression WGAN imbalanced data DESeq2 1D CNN |
| url | https://www.frontiersin.org/articles/10.3389/fgene.2025.1583081/full |
| work_keys_str_mv | AT siyuzhan highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier AT siyuzhan highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier AT haoyu highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier AT shuangliu highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier AT keqin highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier AT luguo highprecisionlungcancersubtypediagnosisonimbalancedexosomaldataviaexolcclassifier |