Tuning-Free Universally-Supervised Semantic Segmentation
This work presents a tuning-free semantic segmentation framework based on classifying SAM masks, which is universally applicable to various types of supervision. Initially, we utilize CLIP’s zero-shot classification ability to generate pseudo-labels or perform open-vocabulary semantic seg...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10779462/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850064865102659584 |
|---|---|
| author | Xiaobo Yang Xiaojin Gong |
| author_facet | Xiaobo Yang Xiaojin Gong |
| author_sort | Xiaobo Yang |
| collection | DOAJ |
| description | This work presents a tuning-free semantic segmentation framework based on classifying SAM masks, which is universally applicable to various types of supervision. Initially, we utilize CLIP’s zero-shot classification ability to generate pseudo-labels or perform open-vocabulary semantic segmentation. However, the misalignment between mask and CLIP text embeddings leads to suboptimal results. To address this issue, we propose discrimination-bias aligned CLIP to closely align mask and text embedding, offering an overhead-free performance gain. We then construct a global-local consistent classifier to classify SAM masks, which reveals the intrinsic structure of high-quality embeddings produced by DBA-CLIP and demonstrates robustness against noisy pseudo-labels. Extensive experiments validate the efficiency and effectiveness of our method, and we achieve state-of-the-art (SOTA) or competitive performance across various datasets and supervision types. Our code will be released upon acceptance. |
| format | Article |
| id | doaj-art-1b20f35cd5ac46b09542bdbee1acfada |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-1b20f35cd5ac46b09542bdbee1acfada2025-08-20T02:49:09ZengIEEEIEEE Access2169-35362024-01-011218732918734210.1109/ACCESS.2024.351237910779462Tuning-Free Universally-Supervised Semantic SegmentationXiaobo Yang0https://orcid.org/0009-0003-7885-302XXiaojin Gong1https://orcid.org/0000-0001-9955-3569College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, ChinaCollege of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, ChinaThis work presents a tuning-free semantic segmentation framework based on classifying SAM masks, which is universally applicable to various types of supervision. Initially, we utilize CLIP’s zero-shot classification ability to generate pseudo-labels or perform open-vocabulary semantic segmentation. However, the misalignment between mask and CLIP text embeddings leads to suboptimal results. To address this issue, we propose discrimination-bias aligned CLIP to closely align mask and text embedding, offering an overhead-free performance gain. We then construct a global-local consistent classifier to classify SAM masks, which reveals the intrinsic structure of high-quality embeddings produced by DBA-CLIP and demonstrates robustness against noisy pseudo-labels. Extensive experiments validate the efficiency and effectiveness of our method, and we achieve state-of-the-art (SOTA) or competitive performance across various datasets and supervision types. Our code will be released upon acceptance.https://ieeexplore.ieee.org/document/10779462/Foundation modelopen-vocabulary semantic segmentationsemantic segmentationsemi-supervised semantic segmentationvision language modelweakly supervised semantic segmentation.00 |
| spellingShingle | Xiaobo Yang Xiaojin Gong Tuning-Free Universally-Supervised Semantic Segmentation IEEE Access Foundation model open-vocabulary semantic segmentation semantic segmentation semi-supervised semantic segmentation vision language model weakly supervised semantic segmentation.00 |
| title | Tuning-Free Universally-Supervised Semantic Segmentation |
| title_full | Tuning-Free Universally-Supervised Semantic Segmentation |
| title_fullStr | Tuning-Free Universally-Supervised Semantic Segmentation |
| title_full_unstemmed | Tuning-Free Universally-Supervised Semantic Segmentation |
| title_short | Tuning-Free Universally-Supervised Semantic Segmentation |
| title_sort | tuning free universally supervised semantic segmentation |
| topic | Foundation model open-vocabulary semantic segmentation semantic segmentation semi-supervised semantic segmentation vision language model weakly supervised semantic segmentation.00 |
| url | https://ieeexplore.ieee.org/document/10779462/ |
| work_keys_str_mv | AT xiaoboyang tuningfreeuniversallysupervisedsemanticsegmentation AT xiaojingong tuningfreeuniversallysupervisedsemanticsegmentation |