Large multimodal model for open vocabulary semantic segmentation of remote sensing images

Conventional remote sensing image semantic segmentation tasks require training specialized models for specific categories of ground objects, which often fail to recognize unseen ground object categories during the training process. The generalization ability of the model is the key to achieving open...

Full description

Saved in:
Bibliographic Details
Main Authors: Bing Liu, Xiaohui Chen, Anzhu Yu, Fan Feng, Jiaying Yue, Xuchu Yu
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:European Journal of Remote Sensing
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/22797254.2024.2447344
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Conventional remote sensing image semantic segmentation tasks require training specialized models for specific categories of ground objects, which often fail to recognize unseen ground object categories during the training process. The generalization ability of the model is the key to achieving open vocabulary semantic segmentation of remote sensing images. Recently, large multimodal models that have been pre-trained with massive amounts of image and text data have demonstrated strong generalization capabilities. Inspired by the success of large multimodal models, we propose an open vocabulary segmentation method for remote sensing images. The proposed method uses the large multimodal model LLAVA and the vision large model SAM to achieve segmentation of open vocabulary. Specifically, LLAVA is used to understand the remote sensing images and the open vocabulary, and SAM is used to extract visual features of the remote sensing images. Finally, the features extracted from SAM and LLAVA are input into the mask decoder to complete semantic segmentation tasks. In order to verify the effectiveness of the proposed method, we conducted a large number of experiments on multiple ground object categories such as airplane, ship, river and lake. The qualitative and quantitative evaluation results fully verified the effectiveness of our proposed method.
ISSN:2279-7254