Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization
Prompt tuning visual-language models (VLMs) for specialized tasks often involves leveraging task-specific textual tokens, which can tailor the pre-existing, broad capabilities of a VLM to more narrowly focused applications. This approach, exemplified by CoOp-based methods, integrates mutable textual...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Entropy |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1099-4300/27/3/301 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850090871018487808 |
|---|---|
| author | Qiangxing Tian Min Zhang |
| author_facet | Qiangxing Tian Min Zhang |
| author_sort | Qiangxing Tian |
| collection | DOAJ |
| description | Prompt tuning visual-language models (VLMs) for specialized tasks often involves leveraging task-specific textual tokens, which can tailor the pre-existing, broad capabilities of a VLM to more narrowly focused applications. This approach, exemplified by CoOp-based methods, integrates mutable textual tokens with categorical tokens to foster nuanced textual comprehension. Nonetheless, such specialized textual insights often fail to generalize beyond the scope of familiar categories, as they tend to overshadow the versatile, general textual knowledge intrinsic to the model’s wide-ranging applicability. Addressing this base-novel dilemma, we propose the innovative concept of <b>SparseK</b>nowledge-<b>g</b>uided <b>Co</b>ntext <b>Op</b>timization (Sparse-KgCoOp). This technique aims to fortify the adaptable prompts’ capacity to generalize to categories yet unencountered. The cornerstone of Sparse-KgCoOp is based on the premise that reducing the differences between adaptive prompt and their hand-crafted counterparts through sparsification operations can mitigate the erosion of fundamental knowledge. Specifically, Sparse-KgCoOp seeks to narrow the gap between the textual embeddings produced by both the dynamic prompts and the manually devised ones, thus preserving the foundational knowledge while maintaining adaptability. Extensive experiments of several benchmarks demonstrate that the proposed Sparse-KgCoOp is an efficient method for prompt tuning. |
| format | Article |
| id | doaj-art-d1735aa5861840a699095cbb69339e1f |
| institution | DOAJ |
| issn | 1099-4300 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Entropy |
| spelling | doaj-art-d1735aa5861840a699095cbb69339e1f2025-08-20T02:42:29ZengMDPI AGEntropy1099-43002025-03-0127330110.3390/e27030301Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context OptimizationQiangxing Tian0Min Zhang1School of Information and Electrical Engineering, Hangzhou City University, Hangzhou 310015, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaPrompt tuning visual-language models (VLMs) for specialized tasks often involves leveraging task-specific textual tokens, which can tailor the pre-existing, broad capabilities of a VLM to more narrowly focused applications. This approach, exemplified by CoOp-based methods, integrates mutable textual tokens with categorical tokens to foster nuanced textual comprehension. Nonetheless, such specialized textual insights often fail to generalize beyond the scope of familiar categories, as they tend to overshadow the versatile, general textual knowledge intrinsic to the model’s wide-ranging applicability. Addressing this base-novel dilemma, we propose the innovative concept of <b>SparseK</b>nowledge-<b>g</b>uided <b>Co</b>ntext <b>Op</b>timization (Sparse-KgCoOp). This technique aims to fortify the adaptable prompts’ capacity to generalize to categories yet unencountered. The cornerstone of Sparse-KgCoOp is based on the premise that reducing the differences between adaptive prompt and their hand-crafted counterparts through sparsification operations can mitigate the erosion of fundamental knowledge. Specifically, Sparse-KgCoOp seeks to narrow the gap between the textual embeddings produced by both the dynamic prompts and the manually devised ones, thus preserving the foundational knowledge while maintaining adaptability. Extensive experiments of several benchmarks demonstrate that the proposed Sparse-KgCoOp is an efficient method for prompt tuning.https://www.mdpi.com/1099-4300/27/3/301visual-language modelsprompt tuningsparse knowledge-guided context optimization |
| spellingShingle | Qiangxing Tian Min Zhang Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization Entropy visual-language models prompt tuning sparse knowledge-guided context optimization |
| title | Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization |
| title_full | Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization |
| title_fullStr | Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization |
| title_full_unstemmed | Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization |
| title_short | Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization |
| title_sort | enhancing visual language prompt tuning through sparse knowledge guided context optimization |
| topic | visual-language models prompt tuning sparse knowledge-guided context optimization |
| url | https://www.mdpi.com/1099-4300/27/3/301 |
| work_keys_str_mv | AT qiangxingtian enhancingvisuallanguageprompttuningthroughsparseknowledgeguidedcontextoptimization AT minzhang enhancingvisuallanguageprompttuningthroughsparseknowledgeguidedcontextoptimization |