Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense an...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/15/4691 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%. |
|---|---|
| ISSN: | 1424-8220 |