Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition

Hand gesture recognition (HGR) based on surface electromyogram (sEMG) and Accelerometer (ACC) signals is increasingly attractive where fusion strategies are crucial for performance and remain challenging. Currently, neural network-based fusion methods have gained superior performance. Nevertheless,...

Full description

Saved in:
Bibliographic Details
Main Authors: Shengcai Duan, Le Wu, Aiping Liu, Xun Chen
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Transactions on Neural Systems and Rehabilitation Engineering
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10323506/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849735173259657216
author Shengcai Duan
Le Wu
Aiping Liu
Xun Chen
author_facet Shengcai Duan
Le Wu
Aiping Liu
Xun Chen
author_sort Shengcai Duan
collection DOAJ
description Hand gesture recognition (HGR) based on surface electromyogram (sEMG) and Accelerometer (ACC) signals is increasingly attractive where fusion strategies are crucial for performance and remain challenging. Currently, neural network-based fusion methods have gained superior performance. Nevertheless, these methods typically fuse sEMG and ACC either in the early or late stages, overlooking the integration of entire cross-modal hierarchical information within each individual hidden layer, thus inducing inefficient inter-modal fusion. To this end, we propose a novel Alignment-Enhanced Interactive Fusion (AiFusion) model, which achieves effective fusion via a progressive hierarchical fusion strategy. Notably, AiFusion can flexibly perform both complete and incomplete multimodal HGR. Specifically, AiFusion contains two unimodal branches and a cascaded transformer-based multimodal fusion branch. The fusion branch is first designed to adequately characterize modality-interactive knowledge by adaptively capturing inter-modal similarity and fusing hierarchical features from all branches layer by layer. Then, the modality-interactive knowledge is aligned with that of unimodality using cross-modal supervised contrastive learning and online distillation from embedding and probability spaces respectively. These alignments further promote fusion quality and refine modality-specific representations. Finally, the recognition outcomes are set to be determined by available modalities, thus contributing to handling the incomplete multimodal HGR problem, which is frequently encountered in real-world scenarios. Experimental results on five public datasets demonstrate that AiFusion outperforms most state-of-the-art benchmarks in complete multimodal HGR. Impressively, it also surpasses the unimodal baselines in the challenging incomplete multimodal HGR. The proposed AiFusion provides a promising solution to realize effective and robust multimodal HGR-based interfaces.
format Article
id doaj-art-a91095580e2e4ada89c9b8f0e64a9e7a
institution DOAJ
issn 1534-4320
1558-0210
language English
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Transactions on Neural Systems and Rehabilitation Engineering
spelling doaj-art-a91095580e2e4ada89c9b8f0e64a9e7a2025-08-20T03:07:37ZengIEEEIEEE Transactions on Neural Systems and Rehabilitation Engineering1534-43201558-02102023-01-01314661467110.1109/TNSRE.2023.333510110323506Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture RecognitionShengcai Duan0https://orcid.org/0000-0002-7582-3891Le Wu1https://orcid.org/0000-0002-8565-9626Aiping Liu2https://orcid.org/0000-0001-8849-5228Xun Chen3https://orcid.org/0000-0002-4922-8116School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui, ChinaSchool of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui, ChinaSchool of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui, ChinaDepartment of Neurosurgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, and the Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, ChinaHand gesture recognition (HGR) based on surface electromyogram (sEMG) and Accelerometer (ACC) signals is increasingly attractive where fusion strategies are crucial for performance and remain challenging. Currently, neural network-based fusion methods have gained superior performance. Nevertheless, these methods typically fuse sEMG and ACC either in the early or late stages, overlooking the integration of entire cross-modal hierarchical information within each individual hidden layer, thus inducing inefficient inter-modal fusion. To this end, we propose a novel Alignment-Enhanced Interactive Fusion (AiFusion) model, which achieves effective fusion via a progressive hierarchical fusion strategy. Notably, AiFusion can flexibly perform both complete and incomplete multimodal HGR. Specifically, AiFusion contains two unimodal branches and a cascaded transformer-based multimodal fusion branch. The fusion branch is first designed to adequately characterize modality-interactive knowledge by adaptively capturing inter-modal similarity and fusing hierarchical features from all branches layer by layer. Then, the modality-interactive knowledge is aligned with that of unimodality using cross-modal supervised contrastive learning and online distillation from embedding and probability spaces respectively. These alignments further promote fusion quality and refine modality-specific representations. Finally, the recognition outcomes are set to be determined by available modalities, thus contributing to handling the incomplete multimodal HGR problem, which is frequently encountered in real-world scenarios. Experimental results on five public datasets demonstrate that AiFusion outperforms most state-of-the-art benchmarks in complete multimodal HGR. Impressively, it also surpasses the unimodal baselines in the challenging incomplete multimodal HGR. The proposed AiFusion provides a promising solution to realize effective and robust multimodal HGR-based interfaces.https://ieeexplore.ieee.org/document/10323506/Multimodal fusionhand gesture recognitionmyoelectric controlaccelerometerincomplete multimodalalignment
spellingShingle Shengcai Duan
Le Wu
Aiping Liu
Xun Chen
Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition
IEEE Transactions on Neural Systems and Rehabilitation Engineering
Multimodal fusion
hand gesture recognition
myoelectric control
accelerometer
incomplete multimodal
alignment
title Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition
title_full Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition
title_fullStr Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition
title_full_unstemmed Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition
title_short Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition
title_sort alignment enhanced interactive fusion model for complete and incomplete multimodal hand gesture recognition
topic Multimodal fusion
hand gesture recognition
myoelectric control
accelerometer
incomplete multimodal
alignment
url https://ieeexplore.ieee.org/document/10323506/
work_keys_str_mv AT shengcaiduan alignmentenhancedinteractivefusionmodelforcompleteandincompletemultimodalhandgesturerecognition
AT lewu alignmentenhancedinteractivefusionmodelforcompleteandincompletemultimodalhandgesturerecognition
AT aipingliu alignmentenhancedinteractivefusionmodelforcompleteandincompletemultimodalhandgesturerecognition
AT xunchen alignmentenhancedinteractivefusionmodelforcompleteandincompletemultimodalhandgesturerecognition