RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model

The foundation model fine-tuning optimization method has gradually become a research hotspot due to the development of generative pretrained transformer. However, compared to natural scene images, remote sensing images have a wide range of spatial scales, complex objects, and limited labelled sample...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qiang Zhang, Decheng Wang, Xiao Yu
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Remote Sensing
Subjects:	remote sensing foundation model optimization object detection semantic segmentation change detection
Online Access:	https://www.mdpi.com/2072-4292/17/10/1661
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849326818081898496
author	Qiang Zhang Decheng Wang Xiao Yu
author_facet	Qiang Zhang Decheng Wang Xiao Yu
author_sort	Qiang Zhang
collection	DOAJ
description	The foundation model fine-tuning optimization method has gradually become a research hotspot due to the development of generative pretrained transformer. However, compared to natural scene images, remote sensing images have a wide range of spatial scales, complex objects, and limited labelled samples, which introduce great challenges to image interpretation. To reduce the gap between nature scene images and remote sensing images, this paper proposes a novel RLita optimization method for foundation models. Specifically, a region-level image–text alignment optimization method is proposed to represent the features of images and texts as visual and sematic representation vectors in one embedding space for better model generalization, and a parameter-efficient tuning strategy is designed to reduce computational resources. Experiments on five remote sensing datasets including object detection, semantic segmentation, and change detection show the effectiveness of the RLita method.
format	Article
id	doaj-art-fed79724f79a4191b8720c3c8106be82
institution	Kabale University
issn	2072-4292
language	English
publishDate	2025-05-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj-art-fed79724f79a4191b8720c3c8106be822025-08-20T03:48:02ZengMDPI AGRemote Sensing2072-42922025-05-011710166110.3390/rs17101661RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation ModelQiang Zhang0Decheng Wang1Xiao Yu2Beijing Institute of Tracking and Telecommunication Technology, Beijing 100094, ChinaBeijing Institute of Tracking and Telecommunication Technology, Beijing 100094, ChinaBeijing Institute of Tracking and Telecommunication Technology, Beijing 100094, ChinaThe foundation model fine-tuning optimization method has gradually become a research hotspot due to the development of generative pretrained transformer. However, compared to natural scene images, remote sensing images have a wide range of spatial scales, complex objects, and limited labelled samples, which introduce great challenges to image interpretation. To reduce the gap between nature scene images and remote sensing images, this paper proposes a novel RLita optimization method for foundation models. Specifically, a region-level image–text alignment optimization method is proposed to represent the features of images and texts as visual and sematic representation vectors in one embedding space for better model generalization, and a parameter-efficient tuning strategy is designed to reduce computational resources. Experiments on five remote sensing datasets including object detection, semantic segmentation, and change detection show the effectiveness of the RLita method.https://www.mdpi.com/2072-4292/17/10/1661remote sensingfoundation model optimizationobject detectionsemantic segmentationchange detection
spellingShingle	Qiang Zhang Decheng Wang Xiao Yu RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model Remote Sensing remote sensing foundation model optimization object detection semantic segmentation change detection
title	RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model
title_full	RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model
title_fullStr	RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model
title_full_unstemmed	RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model
title_short	RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model
title_sort	rlita a region level image text alignment method for remote sensing foundation model
topic	remote sensing foundation model optimization object detection semantic segmentation change detection
url	https://www.mdpi.com/2072-4292/17/10/1661
work_keys_str_mv	AT qiangzhang rlitaaregionlevelimagetextalignmentmethodforremotesensingfoundationmodel AT dechengwang rlitaaregionlevelimagetextalignmentmethodforremotesensingfoundationmodel AT xiaoyu rlitaaregionlevelimagetextalignmentmethodforremotesensingfoundationmodel

RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model

Similar Items