Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals

Abstract Background Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence...

Full description

Saved in:
Bibliographic Details
Main Authors: Qi Li, Wei Cao, Anyuan Zhang
Format: Article
Language:English
Published: BMC 2025-08-01
Series:Journal of Translational Medicine
Subjects:
Online Access:https://doi.org/10.1186/s12967-025-06862-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849761288810397696
author Qi Li
Wei Cao
Anyuan Zhang
author_facet Qi Li
Wei Cao
Anyuan Zhang
author_sort Qi Li
collection DOAJ
description Abstract Background Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence of EEG signals, limiting the performance enhancement of the models in seizure detection. Methods Our study proposes an epilepsy detection model, CMFViT, based on a Multi-Stream Feature Fusion (MSFF) strategy that fuses a Convolutional Neural Network (CNN) with a Vision Transformer (ViT). The model converts EEG signals into time-frequency domain images using the Tunable Q-factor Wavelet Transform (TQWT), and then utilizes the CNN module and the ViT module to capture local features and global time-series correlations, respectively. It fuses different feature representations through the MSFF strategy to enhance its discriminative ability, and finally completes the classification task through the average pooling layer and the fully connected layer. Results The effectiveness of the model was validated by experimental evaluations on the publicly available CHB-MIT dataset and the Kaggle 121 people epilepsy dataset. The model achieved 98.85% classification accuracy and other excellent metrics in single-subject experiments on the CHB-MIT dataset, and also demonstrated strong performance in cross-subject experiments on the Kaggle dataset. Ablation experiments demonstrate the complementary roles of the CNN and ViT modules, and their integration significantly improves detection accuracy and generalization. Comparisons with other methods highlight the advantages of the CMFViT model. Conclusions The CMFViT model provides an efficient, accurate, and innovative solution for complex EEG signal analysis and seizure detection tasks for single and cross-subjects while laying the foundation for developing real-time, accurate seizure detection systems.
format Article
id doaj-art-dc31ae80f27c4bd3bb29bea463fd3abc
institution DOAJ
issn 1479-5876
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series Journal of Translational Medicine
spelling doaj-art-dc31ae80f27c4bd3bb29bea463fd3abc2025-08-20T03:06:04ZengBMCJournal of Translational Medicine1479-58762025-08-0123112310.1186/s12967-025-06862-zMulti-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signalsQi Li0Wei Cao1Anyuan Zhang2School of Computer Science and Technology, Changchun University of Science and TechnologySchool of Computer Science and Technology, Changchun University of Science and TechnologySchool of Computer Science and Technology, Changchun University of Science and TechnologyAbstract Background Automated seizure detection based on scalp electroencephalography (EEG) can significantly accelerate the epilepsy diagnosis process. However, most existing deep learning-based epilepsy detection methods are deficient in mining the local features and global time series dependence of EEG signals, limiting the performance enhancement of the models in seizure detection. Methods Our study proposes an epilepsy detection model, CMFViT, based on a Multi-Stream Feature Fusion (MSFF) strategy that fuses a Convolutional Neural Network (CNN) with a Vision Transformer (ViT). The model converts EEG signals into time-frequency domain images using the Tunable Q-factor Wavelet Transform (TQWT), and then utilizes the CNN module and the ViT module to capture local features and global time-series correlations, respectively. It fuses different feature representations through the MSFF strategy to enhance its discriminative ability, and finally completes the classification task through the average pooling layer and the fully connected layer. Results The effectiveness of the model was validated by experimental evaluations on the publicly available CHB-MIT dataset and the Kaggle 121 people epilepsy dataset. The model achieved 98.85% classification accuracy and other excellent metrics in single-subject experiments on the CHB-MIT dataset, and also demonstrated strong performance in cross-subject experiments on the Kaggle dataset. Ablation experiments demonstrate the complementary roles of the CNN and ViT modules, and their integration significantly improves detection accuracy and generalization. Comparisons with other methods highlight the advantages of the CMFViT model. Conclusions The CMFViT model provides an efficient, accurate, and innovative solution for complex EEG signal analysis and seizure detection tasks for single and cross-subjects while laying the foundation for developing real-time, accurate seizure detection systems.https://doi.org/10.1186/s12967-025-06862-zMulti-stream feature fusion (MSFF)Vision transformer (ViT)Convolutional neural network (CNN)Epileptic seizure detectionElectroencephalography (EEG) signals
spellingShingle Qi Li
Wei Cao
Anyuan Zhang
Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals
Journal of Translational Medicine
Multi-stream feature fusion (MSFF)
Vision transformer (ViT)
Convolutional neural network (CNN)
Epileptic seizure detection
Electroencephalography (EEG) signals
title Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals
title_full Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals
title_fullStr Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals
title_full_unstemmed Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals
title_short Multi-stream feature fusion of vision transformer and CNN for precise epileptic seizure detection from EEG signals
title_sort multi stream feature fusion of vision transformer and cnn for precise epileptic seizure detection from eeg signals
topic Multi-stream feature fusion (MSFF)
Vision transformer (ViT)
Convolutional neural network (CNN)
Epileptic seizure detection
Electroencephalography (EEG) signals
url https://doi.org/10.1186/s12967-025-06862-z
work_keys_str_mv AT qili multistreamfeaturefusionofvisiontransformerandcnnforpreciseepilepticseizuredetectionfromeegsignals
AT weicao multistreamfeaturefusionofvisiontransformerandcnnforpreciseepilepticseizuredetectionfromeegsignals
AT anyuanzhang multistreamfeaturefusionofvisiontransformerandcnnforpreciseepilepticseizuredetectionfromeegsignals