Dual-stream interactive mechanism with multi-modal hierarchical aggregation transformer for gait recognition

Abstract Gait recognition, as an advanced biometric technology, offers distinctive advantages in long-distance identification and low-resolution scenarios. Nevertheless, current gait recognition methodologies predominantly rely on unimodal approaches, leading to suboptimal recognition performance. W...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinghang Liu, Xiangyuan Xu, Yan Qiu, Chunzhi Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-10930-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235443463225344
author Jinghang Liu
Xiangyuan Xu
Yan Qiu
Chunzhi Wang
author_facet Jinghang Liu
Xiangyuan Xu
Yan Qiu
Chunzhi Wang
author_sort Jinghang Liu
collection DOAJ
description Abstract Gait recognition, as an advanced biometric technology, offers distinctive advantages in long-distance identification and low-resolution scenarios. Nevertheless, current gait recognition methodologies predominantly rely on unimodal approaches, leading to suboptimal recognition performance. When processing multimodal data, existing algorithms frequently encounter challenges such as integration difficulties and insufficient fusion due to data heterogeneity across different modalities. Additionally, there is inadequate utilization of spatio-temporal information in multimodal environments, particularly limitations in capturing long-range dependencies and fine-grained dynamic features. These issues prevent existing algorithms from fully leveraging the complementary advantages of multimodal data. To address these limitations, we propose GaitSMAT: a novel gait recognition network integrating silhouette and heatmap data through a Dual-Stream Interactive Mechanism (DSM) and Multi-modal Hierarchical Aggregation Transformer (MHAT). The DSM component facilitates effective spatial feature interaction and enhancement through bidirectional computational exchange and adaptive scaling between feature streams, enabling comprehensive capture of long-range dependencies in feature maps and strengthening representational capacity of each modality. This mechanism incorporates batch normalization and residual connections to ensure stable and effective feature extraction. Furthermore, DSM integration with Horizontal Pyramid Pooling (HPP) strengthens inter-part anatomical associations in gait features through attentional interactions between adjacent feature strips.The proposed MHAT establishes dynamic feature interactions among silhouette representations, heatmap characteristics, and fused features through modality-specific query features and shared key-value features. This architecture not only adaptively modulates cross-modal feature importance but also enhances model robustness in complex scenarios. Comprehensive evaluations demonstrate that GaitSMAT achieves state-of-the-art (SOTA) performance on multiple public datasets (GREW, Gait3D, and SUSTech1K), exhibiting significant improvements over existing approaches. The method shows particular efficacy in complex environments, demonstrating superior robustness and accuracy. This work presents a novel technical framework for multimodal gait recognition, offering substantial implications for enhancing practical gait recognition system performance.
format Article
id doaj-art-2a23e3081ce54eeaa3375c070bdbe5cc
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-2a23e3081ce54eeaa3375c070bdbe5cc2025-08-20T04:02:46ZengNature PortfolioScientific Reports2045-23222025-07-0115111610.1038/s41598-025-10930-1Dual-stream interactive mechanism with multi-modal hierarchical aggregation transformer for gait recognitionJinghang Liu0Xiangyuan Xu1Yan Qiu2Chunzhi Wang3School of Computer Science, Hubei University of TechnologySchool of Computer Science, Hubei University of TechnologyComputer Department, Hubei University of Technology Engineering and Technology CollegeSchool of Computer Science, Hubei University of TechnologyAbstract Gait recognition, as an advanced biometric technology, offers distinctive advantages in long-distance identification and low-resolution scenarios. Nevertheless, current gait recognition methodologies predominantly rely on unimodal approaches, leading to suboptimal recognition performance. When processing multimodal data, existing algorithms frequently encounter challenges such as integration difficulties and insufficient fusion due to data heterogeneity across different modalities. Additionally, there is inadequate utilization of spatio-temporal information in multimodal environments, particularly limitations in capturing long-range dependencies and fine-grained dynamic features. These issues prevent existing algorithms from fully leveraging the complementary advantages of multimodal data. To address these limitations, we propose GaitSMAT: a novel gait recognition network integrating silhouette and heatmap data through a Dual-Stream Interactive Mechanism (DSM) and Multi-modal Hierarchical Aggregation Transformer (MHAT). The DSM component facilitates effective spatial feature interaction and enhancement through bidirectional computational exchange and adaptive scaling between feature streams, enabling comprehensive capture of long-range dependencies in feature maps and strengthening representational capacity of each modality. This mechanism incorporates batch normalization and residual connections to ensure stable and effective feature extraction. Furthermore, DSM integration with Horizontal Pyramid Pooling (HPP) strengthens inter-part anatomical associations in gait features through attentional interactions between adjacent feature strips.The proposed MHAT establishes dynamic feature interactions among silhouette representations, heatmap characteristics, and fused features through modality-specific query features and shared key-value features. This architecture not only adaptively modulates cross-modal feature importance but also enhances model robustness in complex scenarios. Comprehensive evaluations demonstrate that GaitSMAT achieves state-of-the-art (SOTA) performance on multiple public datasets (GREW, Gait3D, and SUSTech1K), exhibiting significant improvements over existing approaches. The method shows particular efficacy in complex environments, demonstrating superior robustness and accuracy. This work presents a novel technical framework for multimodal gait recognition, offering substantial implications for enhancing practical gait recognition system performance.https://doi.org/10.1038/s41598-025-10930-1
spellingShingle Jinghang Liu
Xiangyuan Xu
Yan Qiu
Chunzhi Wang
Dual-stream interactive mechanism with multi-modal hierarchical aggregation transformer for gait recognition
Scientific Reports
title Dual-stream interactive mechanism with multi-modal hierarchical aggregation transformer for gait recognition
title_full Dual-stream interactive mechanism with multi-modal hierarchical aggregation transformer for gait recognition
title_fullStr Dual-stream interactive mechanism with multi-modal hierarchical aggregation transformer for gait recognition
title_full_unstemmed Dual-stream interactive mechanism with multi-modal hierarchical aggregation transformer for gait recognition
title_short Dual-stream interactive mechanism with multi-modal hierarchical aggregation transformer for gait recognition
title_sort dual stream interactive mechanism with multi modal hierarchical aggregation transformer for gait recognition
url https://doi.org/10.1038/s41598-025-10930-1
work_keys_str_mv AT jinghangliu dualstreaminteractivemechanismwithmultimodalhierarchicalaggregationtransformerforgaitrecognition
AT xiangyuanxu dualstreaminteractivemechanismwithmultimodalhierarchicalaggregationtransformerforgaitrecognition
AT yanqiu dualstreaminteractivemechanismwithmultimodalhierarchicalaggregationtransformerforgaitrecognition
AT chunzhiwang dualstreaminteractivemechanismwithmultimodalhierarchicalaggregationtransformerforgaitrecognition