SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark
In this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation....
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/4/1751 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849723347198279680 |
|---|---|
| author | Ruolin Yang Da Li Conghui Hu Honggang Zhang |
| author_facet | Ruolin Yang Da Li Conghui Hu Honggang Zhang |
| author_sort | Ruolin Yang |
| collection | DOAJ |
| description | In this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation. Photo masks provide high precision but are labor intensive, limiting scalability. While language descriptions are easy to provide, they often lack the specificity needed to distinguish visually similar objects within a frame. Despite their simplicity, sketches capture rich, fine-grained details of target objects and can be rapidly created, even by non-experts, making them an attractive alternative for segmentation tasks. We introduce a new approach that utilizes sketches as efficient and informative references for video object segmentation. To evaluate sketch-guided segmentation, we introduce a new benchmark consisting of three datasets: Sketch-DAVIS16, Sketch-DAVIS17, and Sketch-YouTube-VOS. Building on a memory-based framework for semi-supervised video object segmentation, we explore effective strategies for integrating sketch-based references. To ensure robust spatiotemporal coherence, we introduce two key innovations: the Temporal Relation Module and Sketch-Anchored Contrastive Learning. These modules enhance the model’s ability to maintain consistency both across time and across different object instances. Our method is evaluated on the Sketch-VOS benchmark, demonstrating superior performance with overall improvements of 1.9%, 3.3%, and 2.0% over state-of-the-art methods on the Sketch-YouTube-VOS, Sketch-DAVIS 2016, and Sketch-DAVIS 2017 validation sets, respectively. Additionally, on the YouTube-VOS validation set, our method outperforms the leading language-based VOS approach by 10.1%. |
| format | Article |
| id | doaj-art-eef09b55f5ba4653ae2eb542ec3307b1 |
| institution | DOAJ |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-eef09b55f5ba4653ae2eb542ec3307b12025-08-20T03:11:03ZengMDPI AGApplied Sciences2076-34172025-02-01154175110.3390/app15041751SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale BenchmarkRuolin Yang0Da Li1Conghui Hu2Honggang Zhang3School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSketchX, Centre for Vision, Speech and Signal Processing, University of Surrey, Surrey GU2 7XH, UKDepartment of Computer Science, National University of Singapore, Singapore 119077, SingaporeSchool of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaIn this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation. Photo masks provide high precision but are labor intensive, limiting scalability. While language descriptions are easy to provide, they often lack the specificity needed to distinguish visually similar objects within a frame. Despite their simplicity, sketches capture rich, fine-grained details of target objects and can be rapidly created, even by non-experts, making them an attractive alternative for segmentation tasks. We introduce a new approach that utilizes sketches as efficient and informative references for video object segmentation. To evaluate sketch-guided segmentation, we introduce a new benchmark consisting of three datasets: Sketch-DAVIS16, Sketch-DAVIS17, and Sketch-YouTube-VOS. Building on a memory-based framework for semi-supervised video object segmentation, we explore effective strategies for integrating sketch-based references. To ensure robust spatiotemporal coherence, we introduce two key innovations: the Temporal Relation Module and Sketch-Anchored Contrastive Learning. These modules enhance the model’s ability to maintain consistency both across time and across different object instances. Our method is evaluated on the Sketch-VOS benchmark, demonstrating superior performance with overall improvements of 1.9%, 3.3%, and 2.0% over state-of-the-art methods on the Sketch-YouTube-VOS, Sketch-DAVIS 2016, and Sketch-DAVIS 2017 validation sets, respectively. Additionally, on the YouTube-VOS validation set, our method outperforms the leading language-based VOS approach by 10.1%.https://www.mdpi.com/2076-3417/15/4/1751sketchesvideo object segmentationsketch-based datasets |
| spellingShingle | Ruolin Yang Da Li Conghui Hu Honggang Zhang SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark Applied Sciences sketches video object segmentation sketch-based datasets |
| title | SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark |
| title_full | SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark |
| title_fullStr | SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark |
| title_full_unstemmed | SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark |
| title_short | SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark |
| title_sort | skvos sketch based video object segmentation with a large scale benchmark |
| topic | sketches video object segmentation sketch-based datasets |
| url | https://www.mdpi.com/2076-3417/15/4/1751 |
| work_keys_str_mv | AT ruolinyang skvossketchbasedvideoobjectsegmentationwithalargescalebenchmark AT dali skvossketchbasedvideoobjectsegmentationwithalargescalebenchmark AT conghuihu skvossketchbasedvideoobjectsegmentationwithalargescalebenchmark AT honggangzhang skvossketchbasedvideoobjectsegmentationwithalargescalebenchmark |