Contrastive Mask Learning for Self-Supervised 3D Skeleton-Based Action Recognition
In this paper, we propose a contrastive mask learning (CML) method for self-supervised 3D skeleton-based action recognition. Specifically, the mask modeling mechanism is integrated into multi-level contrastive learning with the aim of forming a mutually beneficial learning scheme from both contrasti...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/5/1521 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In this paper, we propose a contrastive mask learning (CML) method for self-supervised 3D skeleton-based action recognition. Specifically, the mask modeling mechanism is integrated into multi-level contrastive learning with the aim of forming a mutually beneficial learning scheme from both contrastive learning and masked skeleton reconstruction. The contrastive objective is extended from an individual skeleton instance to clusters by closing the gap between cluster assignment from different instances of the same category, with the goal of pursuing inter-instance consistency. Compared with previous methods, CML integrates contrastive and masked learning comprehensively and enables intra-/inter-instance consistency pursuit via multi-level contrast, which leads to more discriminative skeleton representation learning. Our extensive evaluation of the challenging NTU RGB+D and PKU-MMD benchmarks demonstrates that representations learned via CML exhibit superior discriminability, consistently outperforming state-of-the-art methods in terms of action recognition accuracy. |
|---|---|
| ISSN: | 1424-8220 |