LARE: Latent augmentation using regional embedding with vision-language model

In recent years, considerable research has been conducted on vision-language models (VLMs) that handle both image and text data; these models are being applied to diverse downstream tasks, such as “image-related chat,” “image recognition by instruction,” and “answering visual questions.” Vision-lang...

Full description

Saved in:
Bibliographic Details
Main Authors: Kosuke Sakurai, Tatsuya Ishii, Ryotaro Shimizu, Linxin Song, Masayuki Goto
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Machine Learning with Applications
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666827025000544
Tags: Add Tag
No Tags, Be the first to tag this record!