When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation

Abstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuli Zhou, Guolei Sun, Yawei Li, Guo-Sen Xie, Luca Benini, Ender Konukoglu
Format: Article
Language:English
Published: Springer 2025-06-01
Series:Visual Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44267-025-00082-1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.
ISSN:2097-3330
2731-9008