CLIP-Based Grid Features and Masking for Remote Sensing Image Captioning

Remote sensing image (RSI) captioning is a vision-language multimodal task that aims to describe image content in natural language, facilitating accurate and convenient comprehension of RSIs. Existing methods primarily focus on extracting visual features using vision-task pretraining models, such as...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiaoling Lin, Shuang Wang, Xiutiao Ye, Ruixuan Wang, Rui Yang, Licheng Jiao
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10806569/
Tags: Add Tag
No Tags, Be the first to tag this record!