Semi-supervised action recognition using logit aligned consistency and adaptive negative learning

Abstract With the development of the socialized video era, while semi-supervised action recognition can address the increasingly high costs of video annotation, it still faces significant challenges, particularly in the underexplored application of Vision Transformer. In this paper, we present our w...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fengyun Zuo, Yang Xu, Minggang Wang
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-05-01
Series:	Scientific Reports
Subjects:	Adaptive negative learning Logit standardization preprocessing Negative labels Semi-supervised action recognition Vision transformer
Online Access:	https://doi.org/10.1038/s41598-025-01922-2
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849705125499633664
author	Fengyun Zuo Yang Xu Minggang Wang
author_facet	Fengyun Zuo Yang Xu Minggang Wang
author_sort	Fengyun Zuo
collection	DOAJ
description	Abstract With the development of the socialized video era, while semi-supervised action recognition can address the increasingly high costs of video annotation, it still faces significant challenges, particularly in the underexplored application of Vision Transformer. In this paper, we present our work on designing Full-SVFormer, a simple yet efficient semi-supervised action recognition architecture based on the Transformer framework. Full-SVFormer uses TimeSformer based on pre-trained weights as its backbone network, which balances the accuracy and speed of Transformers in semi-supervised action recognition. Within the stable pseudo-label framework EMA-Teacher, we introduce KL divergence loss, which has undergone logit standardization preprocessing, as its unsupervised consistency loss. This improves the student’s focus on the inherent relationship between the logit of student and teacher. Furthermore, we incorporate the Adaptive Negative Learning (ANL) method to introduce additional negative pseudo-labels, which dynamically evaluate the Top-k performance of the model to adaptively assign negative labels, thus making better use of ambiguous prediction examples. We conducted a number of experiments on two extensive datasets, UCF-101 and HMDB-51, where our overall experimental results achieved superior performance compared to previous methods. Our work further advances the development of Transformer in the domain of semi-supervised action recognition.
format	Article
id	doaj-art-b04147f846ea45bbbbc710255f60a8e2
institution	DOAJ
issn	2045-2322
language	English
publishDate	2025-05-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-b04147f846ea45bbbbc710255f60a8e22025-08-20T03:16:33ZengNature PortfolioScientific Reports2045-23222025-05-0115111610.1038/s41598-025-01922-2Semi-supervised action recognition using logit aligned consistency and adaptive negative learningFengyun Zuo0Yang Xu1Minggang Wang2College of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityZunyi Aluminum Stock Corporation LtdAbstract With the development of the socialized video era, while semi-supervised action recognition can address the increasingly high costs of video annotation, it still faces significant challenges, particularly in the underexplored application of Vision Transformer. In this paper, we present our work on designing Full-SVFormer, a simple yet efficient semi-supervised action recognition architecture based on the Transformer framework. Full-SVFormer uses TimeSformer based on pre-trained weights as its backbone network, which balances the accuracy and speed of Transformers in semi-supervised action recognition. Within the stable pseudo-label framework EMA-Teacher, we introduce KL divergence loss, which has undergone logit standardization preprocessing, as its unsupervised consistency loss. This improves the student’s focus on the inherent relationship between the logit of student and teacher. Furthermore, we incorporate the Adaptive Negative Learning (ANL) method to introduce additional negative pseudo-labels, which dynamically evaluate the Top-k performance of the model to adaptively assign negative labels, thus making better use of ambiguous prediction examples. We conducted a number of experiments on two extensive datasets, UCF-101 and HMDB-51, where our overall experimental results achieved superior performance compared to previous methods. Our work further advances the development of Transformer in the domain of semi-supervised action recognition.https://doi.org/10.1038/s41598-025-01922-2Adaptive negative learningLogit standardization preprocessingNegative labelsSemi-supervised action recognitionVision transformer
spellingShingle	Fengyun Zuo Yang Xu Minggang Wang Semi-supervised action recognition using logit aligned consistency and adaptive negative learning Scientific Reports Adaptive negative learning Logit standardization preprocessing Negative labels Semi-supervised action recognition Vision transformer
title	Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_full	Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_fullStr	Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_full_unstemmed	Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_short	Semi-supervised action recognition using logit aligned consistency and adaptive negative learning
title_sort	semi supervised action recognition using logit aligned consistency and adaptive negative learning
topic	Adaptive negative learning Logit standardization preprocessing Negative labels Semi-supervised action recognition Vision transformer
url	https://doi.org/10.1038/s41598-025-01922-2
work_keys_str_mv	AT fengyunzuo semisupervisedactionrecognitionusinglogitalignedconsistencyandadaptivenegativelearning AT yangxu semisupervisedactionrecognitionusinglogitalignedconsistencyandadaptivenegativelearning AT minggangwang semisupervisedactionrecognitionusinglogitalignedconsistencyandadaptivenegativelearning

Semi-supervised action recognition using logit aligned consistency and adaptive negative learning

Similar Items