Tuning-Free Universally-Supervised Semantic Segmentation

This work presents a tuning-free semantic segmentation framework based on classifying SAM masks, which is universally applicable to various types of supervision. Initially, we utilize CLIP’s zero-shot classification ability to generate pseudo-labels or perform open-vocabulary semantic seg...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaobo Yang, Xiaojin Gong
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10779462/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850064865102659584
author Xiaobo Yang
Xiaojin Gong
author_facet Xiaobo Yang
Xiaojin Gong
author_sort Xiaobo Yang
collection DOAJ
description This work presents a tuning-free semantic segmentation framework based on classifying SAM masks, which is universally applicable to various types of supervision. Initially, we utilize CLIP’s zero-shot classification ability to generate pseudo-labels or perform open-vocabulary semantic segmentation. However, the misalignment between mask and CLIP text embeddings leads to suboptimal results. To address this issue, we propose discrimination-bias aligned CLIP to closely align mask and text embedding, offering an overhead-free performance gain. We then construct a global-local consistent classifier to classify SAM masks, which reveals the intrinsic structure of high-quality embeddings produced by DBA-CLIP and demonstrates robustness against noisy pseudo-labels. Extensive experiments validate the efficiency and effectiveness of our method, and we achieve state-of-the-art (SOTA) or competitive performance across various datasets and supervision types. Our code will be released upon acceptance.
format Article
id doaj-art-1b20f35cd5ac46b09542bdbee1acfada
institution DOAJ
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-1b20f35cd5ac46b09542bdbee1acfada2025-08-20T02:49:09ZengIEEEIEEE Access2169-35362024-01-011218732918734210.1109/ACCESS.2024.351237910779462Tuning-Free Universally-Supervised Semantic SegmentationXiaobo Yang0https://orcid.org/0009-0003-7885-302XXiaojin Gong1https://orcid.org/0000-0001-9955-3569College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, ChinaCollege of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, ChinaThis work presents a tuning-free semantic segmentation framework based on classifying SAM masks, which is universally applicable to various types of supervision. Initially, we utilize CLIP’s zero-shot classification ability to generate pseudo-labels or perform open-vocabulary semantic segmentation. However, the misalignment between mask and CLIP text embeddings leads to suboptimal results. To address this issue, we propose discrimination-bias aligned CLIP to closely align mask and text embedding, offering an overhead-free performance gain. We then construct a global-local consistent classifier to classify SAM masks, which reveals the intrinsic structure of high-quality embeddings produced by DBA-CLIP and demonstrates robustness against noisy pseudo-labels. Extensive experiments validate the efficiency and effectiveness of our method, and we achieve state-of-the-art (SOTA) or competitive performance across various datasets and supervision types. Our code will be released upon acceptance.https://ieeexplore.ieee.org/document/10779462/Foundation modelopen-vocabulary semantic segmentationsemantic segmentationsemi-supervised semantic segmentationvision language modelweakly supervised semantic segmentation.00
spellingShingle Xiaobo Yang
Xiaojin Gong
Tuning-Free Universally-Supervised Semantic Segmentation
IEEE Access
Foundation model
open-vocabulary semantic segmentation
semantic segmentation
semi-supervised semantic segmentation
vision language model
weakly supervised semantic segmentation.00
title Tuning-Free Universally-Supervised Semantic Segmentation
title_full Tuning-Free Universally-Supervised Semantic Segmentation
title_fullStr Tuning-Free Universally-Supervised Semantic Segmentation
title_full_unstemmed Tuning-Free Universally-Supervised Semantic Segmentation
title_short Tuning-Free Universally-Supervised Semantic Segmentation
title_sort tuning free universally supervised semantic segmentation
topic Foundation model
open-vocabulary semantic segmentation
semantic segmentation
semi-supervised semantic segmentation
vision language model
weakly supervised semantic segmentation.00
url https://ieeexplore.ieee.org/document/10779462/
work_keys_str_mv AT xiaoboyang tuningfreeuniversallysupervisedsemanticsegmentation
AT xiaojingong tuningfreeuniversallysupervisedsemanticsegmentation