Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals
Abstract Background Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-08-01
|
| Series: | Journal of Translational Medicine |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12967-025-06862-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849761288810397696 |
|---|---|
| author | Qi Li Wei Cao Anyuan Zhang |
| author_facet | Qi Li Wei Cao Anyuan Zhang |
| author_sort | Qi Li |
| collection | DOAJ |
| description | Abstract Background Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence of EEG signals, limiting the performance enhancement of the models in seizure detection. Methods Our study proposes an epilepsy detection model, CMFViT, based on a Multi-Stream Feature Fusion (MSFF) strategy that fuses a Convolutional Neural Network (CNN) with a Vision Transformer (ViT). The model converts EEG signals into time-frequency domain images using the Tunable Q-factor Wavelet Transform (TQWT), and then utilizes the CNN module and the ViT module to capture local features and global time-series correlations, respectively. It fuses different feature representations through the MSFF strategy to enhance its discriminative ability, and finally completes the classification task through the average pooling layer and the fully connected layer. Results The effectiveness of the model was validated by experimental evaluations on the publicly available CHB-MIT dataset and the Kaggle 121 people epilepsy dataset. The model achieved 98.85% classification accuracy and other excellent metrics in single-subject experiments on the CHB-MIT dataset, and also demonstrated strong performance in cross-subject experiments on the Kaggle dataset. Ablation experiments demonstrate the complementary roles of the CNN and ViT modules, and their integration significantly improves detection accuracy and generalization. Comparisons with other methods highlight the advantages of the CMFViT model. Conclusions The CMFViT model provides an efficient, accurate, and innovative solution for complex EEG signal analysis and seizure detection tasks for single and cross-subjects while laying the foundation for developing real-time, accurate seizure detection systems. |
| format | Article |
| id | doaj-art-dc31ae80f27c4bd3bb29bea463fd3abc |
| institution | DOAJ |
| issn | 1479-5876 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | BMC |
| record_format | Article |
| series | Journal of Translational Medicine |
| spelling | doaj-art-dc31ae80f27c4bd3bb29bea463fd3abc2025-08-20T03:06:04ZengBMCJournal of Translational Medicine1479-58762025-08-0123112310.1186/s12967-025-06862-zMulti-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signalsQi Li0Wei Cao1Anyuan Zhang2School of Computer Science and Technology, Changchun University of Science and TechnologySchool of Computer Science and Technology, Changchun University of Science and TechnologySchool of Computer Science and Technology, Changchun University of Science and TechnologyAbstract Background Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence of EEG signals, limiting the performance enhancement of the models in seizure detection. Methods Our study proposes an epilepsy detection model, CMFViT, based on a Multi-Stream Feature Fusion (MSFF) strategy that fuses a Convolutional Neural Network (CNN) with a Vision Transformer (ViT). The model converts EEG signals into time-frequency domain images using the Tunable Q-factor Wavelet Transform (TQWT), and then utilizes the CNN module and the ViT module to capture local features and global time-series correlations, respectively. It fuses different feature representations through the MSFF strategy to enhance its discriminative ability, and finally completes the classification task through the average pooling layer and the fully connected layer. Results The effectiveness of the model was validated by experimental evaluations on the publicly available CHB-MIT dataset and the Kaggle 121 people epilepsy dataset. The model achieved 98.85% classification accuracy and other excellent metrics in single-subject experiments on the CHB-MIT dataset, and also demonstrated strong performance in cross-subject experiments on the Kaggle dataset. Ablation experiments demonstrate the complementary roles of the CNN and ViT modules, and their integration significantly improves detection accuracy and generalization. Comparisons with other methods highlight the advantages of the CMFViT model. Conclusions The CMFViT model provides an efficient, accurate, and innovative solution for complex EEG signal analysis and seizure detection tasks for single and cross-subjects while laying the foundation for developing real-time, accurate seizure detection systems.https://doi.org/10.1186/s12967-025-06862-zMulti-stream feature fusion (MSFF)Vision transformer (ViT)Convolutional neural network (CNN)Epileptic seizure detectionElectroencephalography (EEG) signals |
| spellingShingle | Qi Li Wei Cao Anyuan Zhang Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals Journal of Translational Medicine Multi-stream feature fusion (MSFF) Vision transformer (ViT) Convolutional neural network (CNN) Epileptic seizure detection Electroencephalography (EEG) signals |
| title | Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals |
| title_full | Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals |
| title_fullStr | Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals |
| title_full_unstemmed | Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals |
| title_short | Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals |
| title_sort | multi stream feature fusion of vision transformer and cnn for precise epileptic seizure detection from eeg signals |
| topic | Multi-stream feature fusion (MSFF) Vision transformer (ViT) Convolutional neural network (CNN) Epileptic seizure detection Electroencephalography (EEG) signals |
| url | https://doi.org/10.1186/s12967-025-06862-z |
| work_keys_str_mv | AT qili multistreamfeaturefusionofvisiontransformerandcnnforpreciseepilepticseizuredetectionfromeegsignals AT weicao multistreamfeaturefusionofvisiontransformerandcnnforpreciseepilepticseizuredetectionfromeegsignals AT anyuanzhang multistreamfeaturefusionofvisiontransformerandcnnforpreciseepilepticseizuredetectionfromeegsignals |