Semi-supervised action recognition using logit aligned consistency and adaptive negative learning

Abstract With the development of the socialized video era, while semi-supervised action recognition can address the increasingly high costs of video annotation, it still faces significant challenges, particularly in the underexplored application of Vision Transformer. In this paper, we present our w...

Full description

Saved in:
Bibliographic Details
Main Authors: Fengyun Zuo, Yang Xu, Minggang Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-01922-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849705125499633664
author Fengyun Zuo
Yang Xu
Minggang Wang
author_facet Fengyun Zuo
Yang Xu
Minggang Wang
author_sort Fengyun Zuo
collection DOAJ
description Abstract With the development of the socialized video era, while semi-supervised action recognition can address the increasingly high costs of video annotation, it still faces significant challenges, particularly in the underexplored application of Vision Transformer. In this paper, we present our work on designing Full-SVFormer, a simple yet efficient semi-supervised action recognition architecture based on the Transformer framework. Full-SVFormer uses TimeSformer based on pre-trained weights as its backbone network, which balances the accuracy and speed of Transformers in semi-supervised action recognition. Within the stable pseudo-label framework EMA-Teacher, we introduce KL divergence loss, which has undergone logit standardization preprocessing, as its unsupervised consistency loss. This improves the student’s focus on the inherent relationship between the logit of student and teacher. Furthermore, we incorporate the Adaptive Negative Learning (ANL) method to introduce additional negative pseudo-labels, which dynamically evaluate the Top-k performance of the model to adaptively assign negative labels, thus making better use of ambiguous prediction examples. We conducted a number of experiments on two extensive datasets, UCF-101 and HMDB-51, where our overall experimental results achieved superior performance compared to previous methods. Our work further advances the development of Transformer in the domain of semi-supervised action recognition.
format Article
id doaj-art-b04147f846ea45bbbbc710255f60a8e2
institution DOAJ
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-b04147f846ea45bbbbc710255f60a8e22025-08-20T03:16:33ZengNature PortfolioScientific Reports2045-23222025-05-0115111610.1038/s41598-025-01922-2Semi-supervised action recognition using logit aligned consistency and adaptive negative learningFengyun Zuo0Yang Xu1Minggang Wang2College of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityZunyi Aluminum Stock Corporation LtdAbstract With the development of the socialized video era, while semi-supervised action recognition can address the increasingly high costs of video annotation, it still faces significant challenges, particularly in the underexplored application of Vision Transformer. In this paper, we present our work on designing Full-SVFormer, a simple yet efficient semi-supervised action recognition architecture based on the Transformer framework. Full-SVFormer uses TimeSformer based on pre-trained weights as its backbone network, which balances the accuracy and speed of Transformers in semi-supervised action recognition. Within the stable pseudo-label framework EMA-Teacher, we introduce KL divergence loss, which has undergone logit standardization preprocessing, as its unsupervised consistency loss. This improves the student’s focus on the inherent relationship between the logit of student and teacher. Furthermore, we incorporate the Adaptive Negative Learning (ANL) method to introduce additional negative pseudo-labels, which dynamically evaluate the Top-k performance of the model to adaptively assign negative labels, thus making better use of ambiguous prediction examples. We conducted a number of experiments on two extensive datasets, UCF-101 and HMDB-51, where our overall experimental results achieved superior performance compared to previous methods. Our work further advances the development of Transformer in the domain of semi-supervised action recognition.https://doi.org/10.1038/s41598-025-01922-2Adaptive negative learningLogit standardization preprocessingNegative labelsSemi-supervised action recognitionVision transformer
spellingShingle Fengyun Zuo
Yang Xu
Minggang Wang
Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
Scientific Reports
Adaptive negative learning
Logit standardization preprocessing
Negative labels
Semi-supervised action recognition
Vision transformer
title Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_full Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_fullStr Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_full_unstemmed Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_short Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_sort semi supervised action recognition using logit aligned consistency and adaptive negative learning
topic Adaptive negative learning
Logit standardization preprocessing
Negative labels
Semi-supervised action recognition
Vision transformer
url https://doi.org/10.1038/s41598-025-01922-2
work_keys_str_mv AT fengyunzuo semisupervisedactionrecognitionusinglogitalignedconsistencyandadaptivenegativelearning
AT yangxu semisupervisedactionrecognitionusinglogitalignedconsistencyandadaptivenegativelearning
AT minggangwang semisupervisedactionrecognitionusinglogitalignedconsistencyandadaptivenegativelearning