Multi-Grained Temporal Clip Transformer for Skeleton-Based Human Activity Recognition
Skeleton-based human activity recognition is a key research topic in the fields of deep learning and computer vision. However, existing approaches are less effective at capturing short-term sub-action information at different granularity levels and long-term motion correlations, which affect recogni...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/9/4768 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Skeleton-based human activity recognition is a key research topic in the fields of deep learning and computer vision. However, existing approaches are less effective at capturing short-term sub-action information at different granularity levels and long-term motion correlations, which affect recognition accuracy. To overcome these challenges, an innovative multi-grained temporal clip transformer (MTC-Former) is proposed. Firstly, based on the transformer backbone, a multi-grained temporal clip attention (MTCA) module with multi-branch architecture is proposed to capture the characteristics of short-term sub-action features. Secondly, an innovative multi-scale spatial–temporal feature interaction module is proposed to jointly learn sub-action dependencies and facilitate skeletal motion interactions, where long-range motion patterns are embedded to enhance correlation modeling. Experiments were conducted on three datasets, including NTU RGB+D, NTU RGB+D 120, and InHARD, and achieved state-of-the-art Top-1 recognition accuracy, demonstrating the superiority of the proposed MTC-Former. |
|---|---|
| ISSN: | 2076-3417 |