When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation

Abstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuli Zhou, Guolei Sun, Yawei Li, Guo-Sen Xie, Luca Benini, Ender Konukoglu
Format: Article
Language:English
Published: Springer 2025-06-01
Series:Visual Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44267-025-00082-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850207292326150144
author Yuli Zhou
Guolei Sun
Yawei Li
Guo-Sen Xie
Luca Benini
Ender Konukoglu
author_facet Yuli Zhou
Guolei Sun
Yawei Li
Guo-Sen Xie
Luca Benini
Ender Konukoglu
author_sort Yuli Zhou
collection DOAJ
description Abstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.
format Article
id doaj-art-0a9ae9b22764400b8ec8e6a6caffd80b
institution OA Journals
issn 2097-3330
2731-9008
language English
publishDate 2025-06-01
publisher Springer
record_format Article
series Visual Intelligence
spelling doaj-art-0a9ae9b22764400b8ec8e6a6caffd80b2025-08-20T02:10:34ZengSpringerVisual Intelligence2097-33302731-90082025-06-013111410.1007/s44267-025-00082-1When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptationYuli Zhou0Guolei Sun1Yawei Li2Guo-Sen Xie3Luca Benini4Ender Konukoglu5Computer Vision Laboratory, ETH ZürichComputer Vision Laboratory, ETH ZürichComputer Vision Laboratory, ETH ZürichSchool of Computer Science and Engineering, Nanjing University of Science and TechnologyIntegrated System Laboratory, ETH ZürichComputer Vision Laboratory, ETH ZürichAbstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.https://doi.org/10.1007/s44267-025-00082-1Multimodal large language modelPrompt engineeringSAM2Video camouflaged object segmentation
spellingShingle Yuli Zhou
Guolei Sun
Yawei Li
Guo-Sen Xie
Luca Benini
Ender Konukoglu
When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
Visual Intelligence
Multimodal large language model
Prompt engineering
SAM2
Video camouflaged object segmentation
title When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_full When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_fullStr When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_full_unstemmed When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_short When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_sort when sam2 meets video camouflaged object segmentation a comprehensive evaluation and adaptation
topic Multimodal large language model
Prompt engineering
SAM2
Video camouflaged object segmentation
url https://doi.org/10.1007/s44267-025-00082-1
work_keys_str_mv AT yulizhou whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation
AT guoleisun whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation
AT yaweili whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation
AT guosenxie whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation
AT lucabenini whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation
AT enderkonukoglu whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation