Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense an...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/15/4691 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849239626795974656 |
|---|---|
| author | Jie Zhao Ying Gao Chunjuan Bo Dong Wang |
| author_facet | Jie Zhao Ying Gao Chunjuan Bo Dong Wang |
| author_sort | Jie Zhao |
| collection | DOAJ |
| description | Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%. |
| format | Article |
| id | doaj-art-efe76fdbd30a48139eb80954a86c90ba |
| institution | Kabale University |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-efe76fdbd30a48139eb80954a86c90ba2025-08-20T04:00:54ZengMDPI AGSensors1424-82202025-07-012515469110.3390/s25154691Generalized Hierarchical Co-Saliency Learning for Label-Efficient TrackingJie Zhao0Ying Gao1Chunjuan Bo2Dong Wang3Dalian University of Technology, Dalian 116024, ChinaHenan Key Laboratory of Safety Technology for Water Conservancy Project, Henan Water & Power Engineering Consulting Co., Ltd., Zhengzhou 451450, ChinaDalian Minzu University, Dalian 116620, ChinaDalian University of Technology, Dalian 116024, ChinaVisual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%.https://www.mdpi.com/1424-8220/25/15/4691visual trackingweakly supervised learningco-saliency attentionegocentric tracking |
| spellingShingle | Jie Zhao Ying Gao Chunjuan Bo Dong Wang Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking Sensors visual tracking weakly supervised learning co-saliency attention egocentric tracking |
| title | Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking |
| title_full | Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking |
| title_fullStr | Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking |
| title_full_unstemmed | Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking |
| title_short | Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking |
| title_sort | generalized hierarchical co saliency learning for label efficient tracking |
| topic | visual tracking weakly supervised learning co-saliency attention egocentric tracking |
| url | https://www.mdpi.com/1424-8220/25/15/4691 |
| work_keys_str_mv | AT jiezhao generalizedhierarchicalcosaliencylearningforlabelefficienttracking AT yinggao generalizedhierarchicalcosaliencylearningforlabelefficienttracking AT chunjuanbo generalizedhierarchicalcosaliencylearningforlabelefficienttracking AT dongwang generalizedhierarchicalcosaliencylearningforlabelefficienttracking |