Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization

Prompt tuning visual-language models (VLMs) for specialized tasks often involves leveraging task-specific textual tokens, which can tailor the pre-existing, broad capabilities of a VLM to more narrowly focused applications. This approach, exemplified by CoOp-based methods, integrates mutable textual...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qiangxing Tian, Min Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Entropy
Subjects:	visual-language models prompt tuning sparse knowledge-guided context optimization
Online Access:	https://www.mdpi.com/1099-4300/27/3/301
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850090871018487808
author	Qiangxing Tian Min Zhang
author_facet	Qiangxing Tian Min Zhang
author_sort	Qiangxing Tian
collection	DOAJ
description	Prompt tuning visual-language models (VLMs) for specialized tasks often involves leveraging task-specific textual tokens, which can tailor the pre-existing, broad capabilities of a VLM to more narrowly focused applications. This approach, exemplified by CoOp-based methods, integrates mutable textual tokens with categorical tokens to foster nuanced textual comprehension. Nonetheless, such specialized textual insights often fail to generalize beyond the scope of familiar categories, as they tend to overshadow the versatile, general textual knowledge intrinsic to the model’s wide-ranging applicability. Addressing this base-novel dilemma, we propose the innovative concept of <b>SparseK</b>nowledge-<b>g</b>uided <b>Co</b>ntext <b>Op</b>timization (Sparse-KgCoOp). This technique aims to fortify the adaptable prompts’ capacity to generalize to categories yet unencountered. The cornerstone of Sparse-KgCoOp is based on the premise that reducing the differences between adaptive prompt and their hand-crafted counterparts through sparsification operations can mitigate the erosion of fundamental knowledge. Specifically, Sparse-KgCoOp seeks to narrow the gap between the textual embeddings produced by both the dynamic prompts and the manually devised ones, thus preserving the foundational knowledge while maintaining adaptability. Extensive experiments of several benchmarks demonstrate that the proposed Sparse-KgCoOp is an efficient method for prompt tuning.
format	Article
id	doaj-art-d1735aa5861840a699095cbb69339e1f
institution	DOAJ
issn	1099-4300
language	English
publishDate	2025-03-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj-art-d1735aa5861840a699095cbb69339e1f2025-08-20T02:42:29ZengMDPI AGEntropy1099-43002025-03-0127330110.3390/e27030301Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context OptimizationQiangxing Tian0Min Zhang1School of Information and Electrical Engineering, Hangzhou City University, Hangzhou 310015, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaPrompt tuning visual-language models (VLMs) for specialized tasks often involves leveraging task-specific textual tokens, which can tailor the pre-existing, broad capabilities of a VLM to more narrowly focused applications. This approach, exemplified by CoOp-based methods, integrates mutable textual tokens with categorical tokens to foster nuanced textual comprehension. Nonetheless, such specialized textual insights often fail to generalize beyond the scope of familiar categories, as they tend to overshadow the versatile, general textual knowledge intrinsic to the model’s wide-ranging applicability. Addressing this base-novel dilemma, we propose the innovative concept of <b>SparseK</b>nowledge-<b>g</b>uided <b>Co</b>ntext <b>Op</b>timization (Sparse-KgCoOp). This technique aims to fortify the adaptable prompts’ capacity to generalize to categories yet unencountered. The cornerstone of Sparse-KgCoOp is based on the premise that reducing the differences between adaptive prompt and their hand-crafted counterparts through sparsification operations can mitigate the erosion of fundamental knowledge. Specifically, Sparse-KgCoOp seeks to narrow the gap between the textual embeddings produced by both the dynamic prompts and the manually devised ones, thus preserving the foundational knowledge while maintaining adaptability. Extensive experiments of several benchmarks demonstrate that the proposed Sparse-KgCoOp is an efficient method for prompt tuning.https://www.mdpi.com/1099-4300/27/3/301visual-language modelsprompt tuningsparse knowledge-guided context optimization
spellingShingle	Qiangxing Tian Min Zhang Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization Entropy visual-language models prompt tuning sparse knowledge-guided context optimization
title	Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization
title_full	Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization
title_fullStr	Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization
title_full_unstemmed	Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization
title_short	Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization
title_sort	enhancing visual language prompt tuning through sparse knowledge guided context optimization
topic	visual-language models prompt tuning sparse knowledge-guided context optimization
url	https://www.mdpi.com/1099-4300/27/3/301
work_keys_str_mv	AT qiangxingtian enhancingvisuallanguageprompttuningthroughsparseknowledgeguidedcontextoptimization AT minzhang enhancingvisuallanguageprompttuningthroughsparseknowledgeguidedcontextoptimization

Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization

Similar Items