Optimized CNN-Bi-LSTM–Based BCI System for Imagined Speech Recognition Using FOA-DWT

Speech imagery is emerging as a significant neuro-paradigm for designing an electroencephalography (EEG)-based brain–computer interface (BCI) system for the purpose of rehabilitation, medical neurology, and to aid people with disabilities in interacting with their surroundings. Neural correlates of...

Full description

Saved in:
Bibliographic Details
Main Authors: Meenakshi Bisla, Radhey Shyam Anand
Format: Article
Language:English
Published: Wiley 2024-01-01
Series:Advances in Human-Computer Interaction
Online Access:http://dx.doi.org/10.1155/2024/8742261
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849696065024950272
author Meenakshi Bisla
Radhey Shyam Anand
author_facet Meenakshi Bisla
Radhey Shyam Anand
author_sort Meenakshi Bisla
collection DOAJ
description Speech imagery is emerging as a significant neuro-paradigm for designing an electroencephalography (EEG)-based brain–computer interface (BCI) system for the purpose of rehabilitation, medical neurology, and to aid people with disabilities in interacting with their surroundings. Neural correlates of speech imagery EEG signals are variable and weak as compared to the vocal state; hence, it is challenging to interpret them using machine learning (ML)–based classifiers. The applicability of modern deep learning methods such as convolutional neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks has seen substantial advances in complex EEG signal analysis as compared to ML-based methods. The objective of this article is to design a firefly-optimized discrete wavelet transform (DWT) and CNN-Bi-LSTM–based imagined speech recognition (ISR) system to interpret imagined speech EEG signals. This study utilizes two publicly available datasets. EEG signal is enhanced using firefly optimization algorithm (FOA)–based optimized soft thresholding of high-frequency detail components obtained by DWT decomposition. The enhanced EEG signal is augmented using sliding window data augmentation to increase the training data. Frequency-domain features like power spectral density (PSD), frequency band power (FBP), band ratios, peak frequency, mean frequency, median frequency, spectral entropy, and relative power are extracted from augmented EEG segments. The extracted feature vector is fed to the designed CNN-Bi-LSTM classifier such that the EEG data are classified into two-class, three-class, and four-class categories. To achieve optimal performance, the CNN-Bi-LSTM model was optimized using the Keras tuner library. The designed CNN consists of one-dimensional (1-D) convolutional layers and max pooling layers for familiarizing local associations along with mining hierarchical connections, and the Bi-LSTM network acquires long-term dependencies from the features learned by the former CNN. Bi-LSTM network improves the performance and acquires potentially more affluent representations by looking at the sequence in both forward and reverse ways to capture representations that might be left unexploited by the sequential-order kind alone. The performance of the designed FOA-DWT-CNN-Bi-LSTM–based ISR system is assessed using four evaluation measures: accuracy, F1 score, recall, and precision. It is found that the proposed system achieves the highest classification accuracy of 99.43 ± 2.5%, 94.41 ± 3.31%, and 89.57 ± 4.3% for two-class, three-class, and four-class categories, respectively.
format Article
id doaj-art-d5ceef6431ff400986fa723813da9b4e
institution DOAJ
issn 1687-5907
language English
publishDate 2024-01-01
publisher Wiley
record_format Article
series Advances in Human-Computer Interaction
spelling doaj-art-d5ceef6431ff400986fa723813da9b4e2025-08-20T03:19:34ZengWileyAdvances in Human-Computer Interaction1687-59072024-01-01202410.1155/2024/8742261Optimized CNN-Bi-LSTM–Based BCI System for Imagined Speech Recognition Using FOA-DWTMeenakshi Bisla0Radhey Shyam Anand1Indian Institute of TechnologyIndian Institute of TechnologySpeech imagery is emerging as a significant neuro-paradigm for designing an electroencephalography (EEG)-based brain–computer interface (BCI) system for the purpose of rehabilitation, medical neurology, and to aid people with disabilities in interacting with their surroundings. Neural correlates of speech imagery EEG signals are variable and weak as compared to the vocal state; hence, it is challenging to interpret them using machine learning (ML)–based classifiers. The applicability of modern deep learning methods such as convolutional neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks has seen substantial advances in complex EEG signal analysis as compared to ML-based methods. The objective of this article is to design a firefly-optimized discrete wavelet transform (DWT) and CNN-Bi-LSTM–based imagined speech recognition (ISR) system to interpret imagined speech EEG signals. This study utilizes two publicly available datasets. EEG signal is enhanced using firefly optimization algorithm (FOA)–based optimized soft thresholding of high-frequency detail components obtained by DWT decomposition. The enhanced EEG signal is augmented using sliding window data augmentation to increase the training data. Frequency-domain features like power spectral density (PSD), frequency band power (FBP), band ratios, peak frequency, mean frequency, median frequency, spectral entropy, and relative power are extracted from augmented EEG segments. The extracted feature vector is fed to the designed CNN-Bi-LSTM classifier such that the EEG data are classified into two-class, three-class, and four-class categories. To achieve optimal performance, the CNN-Bi-LSTM model was optimized using the Keras tuner library. The designed CNN consists of one-dimensional (1-D) convolutional layers and max pooling layers for familiarizing local associations along with mining hierarchical connections, and the Bi-LSTM network acquires long-term dependencies from the features learned by the former CNN. Bi-LSTM network improves the performance and acquires potentially more affluent representations by looking at the sequence in both forward and reverse ways to capture representations that might be left unexploited by the sequential-order kind alone. The performance of the designed FOA-DWT-CNN-Bi-LSTM–based ISR system is assessed using four evaluation measures: accuracy, F1 score, recall, and precision. It is found that the proposed system achieves the highest classification accuracy of 99.43 ± 2.5%, 94.41 ± 3.31%, and 89.57 ± 4.3% for two-class, three-class, and four-class categories, respectively.http://dx.doi.org/10.1155/2024/8742261
spellingShingle Meenakshi Bisla
Radhey Shyam Anand
Optimized CNN-Bi-LSTM–Based BCI System for Imagined Speech Recognition Using FOA-DWT
Advances in Human-Computer Interaction
title Optimized CNN-Bi-LSTM–Based BCI System for Imagined Speech Recognition Using FOA-DWT
title_full Optimized CNN-Bi-LSTM–Based BCI System for Imagined Speech Recognition Using FOA-DWT
title_fullStr Optimized CNN-Bi-LSTM–Based BCI System for Imagined Speech Recognition Using FOA-DWT
title_full_unstemmed Optimized CNN-Bi-LSTM–Based BCI System for Imagined Speech Recognition Using FOA-DWT
title_short Optimized CNN-Bi-LSTM–Based BCI System for Imagined Speech Recognition Using FOA-DWT
title_sort optimized cnn bi lstm based bci system for imagined speech recognition using foa dwt
url http://dx.doi.org/10.1155/2024/8742261
work_keys_str_mv AT meenakshibisla optimizedcnnbilstmbasedbcisystemforimaginedspeechrecognitionusingfoadwt
AT radheyshyamanand optimizedcnnbilstmbasedbcisystemforimaginedspeechrecognitionusingfoadwt