SVIT‐SSR: A sEMG‐based vision transformer approach for silent speech recognition

Abstract Silent speech recognition (SSR) based on surface electromyography (sEMG) is a voice interaction technology proposed for scenarios requiring silent operations. This article abstracts the SSR task based on sEMG into a short‐term image sequence classification task. Time‐frequency domain featur...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhao Li, Bin Ma, Weifan Mao, Jianxing Zhang, Zhuting Yu, Yizhou Lu
Format: Article
Language:English
Published: Wiley 2024-11-01
Series:Electronics Letters
Subjects:
Online Access:https://doi.org/10.1049/ell2.13285
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Silent speech recognition (SSR) based on surface electromyography (sEMG) is a voice interaction technology proposed for scenarios requiring silent operations. This article abstracts the SSR task based on sEMG into a short‐term image sequence classification task. Time‐frequency domain feature extraction and data reconstruction on the muscle activity segment data is performed. Additionally, the temporal and spatial dimensions to capture the intrinsic correlation representation of muscle activity is analysed. The SVIT‐SSR model is proposed based on the vision transformer (VIT) framework. Finally, experiments to identify 33 types of typical silent speech commands in the SSR dataset are designed. The results demonstrate that the proposed model achieves an accuracy of 96.67 ± 1.15%, outperforming similar algorithms.
ISSN:0013-5194
1350-911X