Dual-stream interactive mechanism with multi-modal hierarchical aggregation transformer for gait recognition

Abstract Gait recognition, as an advanced biometric technology, offers distinctive advantages in long-distance identification and low-resolution scenarios. Nevertheless, current gait recognition methodologies predominantly rely on unimodal approaches, leading to suboptimal recognition performance. W...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinghang Liu, Xiangyuan Xu, Yan Qiu, Chunzhi Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-10930-1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Gait recognition, as an advanced biometric technology, offers distinctive advantages in long-distance identification and low-resolution scenarios. Nevertheless, current gait recognition methodologies predominantly rely on unimodal approaches, leading to suboptimal recognition performance. When processing multimodal data, existing algorithms frequently encounter challenges such as integration difficulties and insufficient fusion due to data heterogeneity across different modalities. Additionally, there is inadequate utilization of spatio-temporal information in multimodal environments, particularly limitations in capturing long-range dependencies and fine-grained dynamic features. These issues prevent existing algorithms from fully leveraging the complementary advantages of multimodal data. To address these limitations, we propose GaitSMAT: a novel gait recognition network integrating silhouette and heatmap data through a Dual-Stream Interactive Mechanism (DSM) and Multi-modal Hierarchical Aggregation Transformer (MHAT). The DSM component facilitates effective spatial feature interaction and enhancement through bidirectional computational exchange and adaptive scaling between feature streams, enabling comprehensive capture of long-range dependencies in feature maps and strengthening representational capacity of each modality. This mechanism incorporates batch normalization and residual connections to ensure stable and effective feature extraction. Furthermore, DSM integration with Horizontal Pyramid Pooling (HPP) strengthens inter-part anatomical associations in gait features through attentional interactions between adjacent feature strips.The proposed MHAT establishes dynamic feature interactions among silhouette representations, heatmap characteristics, and fused features through modality-specific query features and shared key-value features. This architecture not only adaptively modulates cross-modal feature importance but also enhances model robustness in complex scenarios. Comprehensive evaluations demonstrate that GaitSMAT achieves state-of-the-art (SOTA) performance on multiple public datasets (GREW, Gait3D, and SUSTech1K), exhibiting significant improvements over existing approaches. The method shows particular efficacy in complex environments, demonstrating superior robustness and accuracy. This work presents a novel technical framework for multimodal gait recognition, offering substantial implications for enhancing practical gait recognition system performance.
ISSN:2045-2322