SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark

In this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation....

Full description

Saved in:
Bibliographic Details
Main Authors: Ruolin Yang, Da Li, Conghui Hu, Honggang Zhang
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/4/1751
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849723347198279680
author Ruolin Yang
Da Li
Conghui Hu
Honggang Zhang
author_facet Ruolin Yang
Da Li
Conghui Hu
Honggang Zhang
author_sort Ruolin Yang
collection DOAJ
description In this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation. Photo masks provide high precision but are labor intensive, limiting scalability. While language descriptions are easy to provide, they often lack the specificity needed to distinguish visually similar objects within a frame. Despite their simplicity, sketches capture rich, fine-grained details of target objects and can be rapidly created, even by non-experts, making them an attractive alternative for segmentation tasks. We introduce a new approach that utilizes sketches as efficient and informative references for video object segmentation. To evaluate sketch-guided segmentation, we introduce a new benchmark consisting of three datasets: Sketch-DAVIS16, Sketch-DAVIS17, and Sketch-YouTube-VOS. Building on a memory-based framework for semi-supervised video object segmentation, we explore effective strategies for integrating sketch-based references. To ensure robust spatiotemporal coherence, we introduce two key innovations: the Temporal Relation Module and Sketch-Anchored Contrastive Learning. These modules enhance the model’s ability to maintain consistency both across time and across different object instances. Our method is evaluated on the Sketch-VOS benchmark, demonstrating superior performance with overall improvements of 1.9%, 3.3%, and 2.0% over state-of-the-art methods on the Sketch-YouTube-VOS, Sketch-DAVIS 2016, and Sketch-DAVIS 2017 validation sets, respectively. Additionally, on the YouTube-VOS validation set, our method outperforms the leading language-based VOS approach by 10.1%.
format Article
id doaj-art-eef09b55f5ba4653ae2eb542ec3307b1
institution DOAJ
issn 2076-3417
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-eef09b55f5ba4653ae2eb542ec3307b12025-08-20T03:11:03ZengMDPI AGApplied Sciences2076-34172025-02-01154175110.3390/app15041751SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale BenchmarkRuolin Yang0Da Li1Conghui Hu2Honggang Zhang3School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSketchX, Centre for Vision, Speech and Signal Processing, University of Surrey, Surrey GU2 7XH, UKDepartment of Computer Science, National University of Singapore, Singapore 119077, SingaporeSchool of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaIn this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation. Photo masks provide high precision but are labor intensive, limiting scalability. While language descriptions are easy to provide, they often lack the specificity needed to distinguish visually similar objects within a frame. Despite their simplicity, sketches capture rich, fine-grained details of target objects and can be rapidly created, even by non-experts, making them an attractive alternative for segmentation tasks. We introduce a new approach that utilizes sketches as efficient and informative references for video object segmentation. To evaluate sketch-guided segmentation, we introduce a new benchmark consisting of three datasets: Sketch-DAVIS16, Sketch-DAVIS17, and Sketch-YouTube-VOS. Building on a memory-based framework for semi-supervised video object segmentation, we explore effective strategies for integrating sketch-based references. To ensure robust spatiotemporal coherence, we introduce two key innovations: the Temporal Relation Module and Sketch-Anchored Contrastive Learning. These modules enhance the model’s ability to maintain consistency both across time and across different object instances. Our method is evaluated on the Sketch-VOS benchmark, demonstrating superior performance with overall improvements of 1.9%, 3.3%, and 2.0% over state-of-the-art methods on the Sketch-YouTube-VOS, Sketch-DAVIS 2016, and Sketch-DAVIS 2017 validation sets, respectively. Additionally, on the YouTube-VOS validation set, our method outperforms the leading language-based VOS approach by 10.1%.https://www.mdpi.com/2076-3417/15/4/1751sketchesvideo object segmentationsketch-based datasets
spellingShingle Ruolin Yang
Da Li
Conghui Hu
Honggang Zhang
SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark
Applied Sciences
sketches
video object segmentation
sketch-based datasets
title SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark
title_full SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark
title_fullStr SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark
title_full_unstemmed SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark
title_short SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark
title_sort skvos sketch based video object segmentation with a large scale benchmark
topic sketches
video object segmentation
sketch-based datasets
url https://www.mdpi.com/2076-3417/15/4/1751
work_keys_str_mv AT ruolinyang skvossketchbasedvideoobjectsegmentationwithalargescalebenchmark
AT dali skvossketchbasedvideoobjectsegmentationwithalargescalebenchmark
AT conghuihu skvossketchbasedvideoobjectsegmentationwithalargescalebenchmark
AT honggangzhang skvossketchbasedvideoobjectsegmentationwithalargescalebenchmark