Parameter-efficient weakly supervised referring video object segmentation via chain-of-thought reasoning
Abstract Referring video object segmentation (RVOS) aims to segment the object corresponding to a language expression in a video. Most existing RVOS methods are trained using accurate per-pixel annotations, which are expensive and time-consuming to obtain. Moreover, they need to update the entire pa...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Complex & Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s40747-025-01900-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Referring video object segmentation (RVOS) aims to segment the object corresponding to a language expression in a video. Most existing RVOS methods are trained using accurate per-pixel annotations, which are expensive and time-consuming to obtain. Moreover, they need to update the entire parameter of a segmentation model, making it inefficient to train as the model scale increases. In this paper, we propose a novel parameter-efficient framework under weak supervision, dubbed ReferringAdapter, to ameliorate both of issues. Specifically, we propose to adapt an off-the-shelf image segmentation model for RVOS by plugging a small set of trained parameters, i.e., an adapter, into the intermediate layer. This efficiently endows a uni-modal image segmentation model with the cross-modal ability to segment the video object referred by a language expression. To update the adapter parameters under weak supervision, instead of directly fuse the video and sentence-level language features, we propose chain-of-thought reasoning to consider the intermediate steps along the thought process. Extensive experiments demonstrate that training the adapter with 1.1% of total parameters can outperform previous weakly supervised methods by 11.6 $$-$$ - 15.3 mAP and achieve comparable performance with fully supervised ones. |
|---|---|
| ISSN: | 2199-4536 2198-6053 |