When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation
Abstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-06-01
|
| Series: | Visual Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44267-025-00082-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850207292326150144 |
|---|---|
| author | Yuli Zhou Guolei Sun Yawei Li Guo-Sen Xie Luca Benini Ender Konukoglu |
| author_facet | Yuli Zhou Guolei Sun Yawei Li Guo-Sen Xie Luca Benini Ender Konukoglu |
| author_sort | Yuli Zhou |
| collection | DOAJ |
| description | Abstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS. |
| format | Article |
| id | doaj-art-0a9ae9b22764400b8ec8e6a6caffd80b |
| institution | OA Journals |
| issn | 2097-3330 2731-9008 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Springer |
| record_format | Article |
| series | Visual Intelligence |
| spelling | doaj-art-0a9ae9b22764400b8ec8e6a6caffd80b2025-08-20T02:10:34ZengSpringerVisual Intelligence2097-33302731-90082025-06-013111410.1007/s44267-025-00082-1When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptationYuli Zhou0Guolei Sun1Yawei Li2Guo-Sen Xie3Luca Benini4Ender Konukoglu5Computer Vision Laboratory, ETH ZürichComputer Vision Laboratory, ETH ZürichComputer Vision Laboratory, ETH ZürichSchool of Computer Science and Engineering, Nanjing University of Science and TechnologyIntegrated System Laboratory, ETH ZürichComputer Vision Laboratory, ETH ZürichAbstract This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.https://doi.org/10.1007/s44267-025-00082-1Multimodal large language modelPrompt engineeringSAM2Video camouflaged object segmentation |
| spellingShingle | Yuli Zhou Guolei Sun Yawei Li Guo-Sen Xie Luca Benini Ender Konukoglu When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation Visual Intelligence Multimodal large language model Prompt engineering SAM2 Video camouflaged object segmentation |
| title | When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation |
| title_full | When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation |
| title_fullStr | When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation |
| title_full_unstemmed | When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation |
| title_short | When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation |
| title_sort | when sam2 meets video camouflaged object segmentation a comprehensive evaluation and adaptation |
| topic | Multimodal large language model Prompt engineering SAM2 Video camouflaged object segmentation |
| url | https://doi.org/10.1007/s44267-025-00082-1 |
| work_keys_str_mv | AT yulizhou whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT guoleisun whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT yaweili whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT guosenxie whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT lucabenini whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation AT enderkonukoglu whensam2meetsvideocamouflagedobjectsegmentationacomprehensiveevaluationandadaptation |