Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking

Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense an...

Full description

Saved in:
Bibliographic Details
Main Authors: Jie Zhao, Ying Gao, Chunjuan Bo, Dong Wang
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/15/4691
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849239626795974656
author Jie Zhao
Ying Gao
Chunjuan Bo
Dong Wang
author_facet Jie Zhao
Ying Gao
Chunjuan Bo
Dong Wang
author_sort Jie Zhao
collection DOAJ
description Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%.
format Article
id doaj-art-efe76fdbd30a48139eb80954a86c90ba
institution Kabale University
issn 1424-8220
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-efe76fdbd30a48139eb80954a86c90ba2025-08-20T04:00:54ZengMDPI AGSensors1424-82202025-07-012515469110.3390/s25154691Generalized Hierarchical Co-Saliency Learning for Label-Efficient TrackingJie Zhao0Ying Gao1Chunjuan Bo2Dong Wang3Dalian University of Technology, Dalian 116024, ChinaHenan Key Laboratory of Safety Technology for Water Conservancy Project, Henan Water & Power Engineering Consulting Co., Ltd., Zhengzhou 451450, ChinaDalian Minzu University, Dalian 116620, ChinaDalian University of Technology, Dalian 116024, ChinaVisual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%.https://www.mdpi.com/1424-8220/25/15/4691visual trackingweakly supervised learningco-saliency attentionegocentric tracking
spellingShingle Jie Zhao
Ying Gao
Chunjuan Bo
Dong Wang
Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
Sensors
visual tracking
weakly supervised learning
co-saliency attention
egocentric tracking
title Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
title_full Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
title_fullStr Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
title_full_unstemmed Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
title_short Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
title_sort generalized hierarchical co saliency learning for label efficient tracking
topic visual tracking
weakly supervised learning
co-saliency attention
egocentric tracking
url https://www.mdpi.com/1424-8220/25/15/4691
work_keys_str_mv AT jiezhao generalizedhierarchicalcosaliencylearningforlabelefficienttracking
AT yinggao generalizedhierarchicalcosaliencylearningforlabelefficienttracking
AT chunjuanbo generalizedhierarchicalcosaliencylearningforlabelefficienttracking
AT dongwang generalizedhierarchicalcosaliencylearningforlabelefficienttracking