Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
Abstract With the development of the socialized video era, while semi-supervised action recognition can address the increasingly high costs of video annotation, it still faces significant challenges, particularly in the underexplored application of Vision Transformer. In this paper, we present our w...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-01922-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract With the development of the socialized video era, while semi-supervised action recognition can address the increasingly high costs of video annotation, it still faces significant challenges, particularly in the underexplored application of Vision Transformer. In this paper, we present our work on designing Full-SVFormer, a simple yet efficient semi-supervised action recognition architecture based on the Transformer framework. Full-SVFormer uses TimeSformer based on pre-trained weights as its backbone network, which balances the accuracy and speed of Transformers in semi-supervised action recognition. Within the stable pseudo-label framework EMA-Teacher, we introduce KL divergence loss, which has undergone logit standardization preprocessing, as its unsupervised consistency loss. This improves the student’s focus on the inherent relationship between the logit of student and teacher. Furthermore, we incorporate the Adaptive Negative Learning (ANL) method to introduce additional negative pseudo-labels, which dynamically evaluate the Top-k performance of the model to adaptively assign negative labels, thus making better use of ambiguous prediction examples. We conducted a number of experiments on two extensive datasets, UCF-101 and HMDB-51, where our overall experimental results achieved superior performance compared to previous methods. Our work further advances the development of Transformer in the domain of semi-supervised action recognition. |
|---|---|
| ISSN: | 2045-2322 |