When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation

Abstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yuli Zhou, Guolei Sun, Yawei Li, Guo-Sen Xie, Luca Benini, Ender Konukoglu
Format:	Article
Language:	English
Published:	Springer 2025-06-01
Series:	Visual Intelligence
Subjects:	Multimodal large language model Prompt engineering SAM2 Video camouflaged object segmentation
Online Access:	https://doi.org/10.1007/s44267-025-00082-1
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850207292326150144
author	Yuli Zhou Guolei Sun Yawei Li Guo-Sen Xie Luca Benini Ender Konukoglu
author_facet	Yuli Zhou Guolei Sun Yawei Li Guo-Sen Xie Luca Benini Ender Konukoglu
author_sort	Yuli Zhou
collection	DOAJ
description	Abstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.
format	Article
id	doaj-art-0a9ae9b22764400b8ec8e6a6caffd80b
institution	OA Journals
issn	2097-3330 2731-9008
language	English
publishDate	2025-06-01
publisher	Springer
record_format	Article
series	Visual Intelligence
spelling	doaj-art-0a9ae9b22764400b8ec8e6a6caffd80b2025-08-20T02:10:34ZengSpringerVisual Intelligence2097-33302731-90082025-06-013111410.1007/s44267-025-00082-1When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptationYuli Zhou0Guolei Sun1Yawei Li2Guo-Sen Xie3Luca Benini4Ender Konukoglu5Computer Vision Laboratory, ETH ZürichComputer Vision Laboratory, ETH ZürichComputer Vision Laboratory, ETH ZürichSchool of Computer Science and Engineering, Nanjing University of Science and TechnologyIntegrated System Laboratory, ETH ZürichComputer Vision Laboratory, ETH ZürichAbstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.https://doi.org/10.1007/s44267-025-00082-1Multimodal large language modelPrompt engineeringSAM2Video camouflaged object segmentation
spellingShingle	Yuli Zhou Guolei Sun Yawei Li Guo-Sen Xie Luca Benini Ender Konukoglu When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation Visual Intelligence Multimodal large language model Prompt engineering SAM2 Video camouflaged object segmentation
title	When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_full	When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_fullStr	When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_full_unstemmed	When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_short	When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
title_sort	when sam2 meets video camouflaged object segmentation a comprehensive evaluation and adaptation
topic	Multimodal large language model Prompt engineering SAM2 Video camouflaged object segmentation
url	https://doi.org/10.1007/s44267-025-00082-1
work_keys_str_mv	AT yulizhou whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT guoleisun whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT yaweili whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT guosenxie whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT lucabenini whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT enderkonukoglu whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation

When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation

Similar Items