RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model
The foundation model fine-tuning optimization method has gradually become a research hotspot due to the development of generative pretrained transformer. However, compared to natural scene images, remote sensing images have a wide range of spatial scales, complex objects, and limited labelled sample...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/10/1661 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849326818081898496 |
|---|---|
| author | Qiang Zhang Decheng Wang Xiao Yu |
| author_facet | Qiang Zhang Decheng Wang Xiao Yu |
| author_sort | Qiang Zhang |
| collection | DOAJ |
| description | The foundation model fine-tuning optimization method has gradually become a research hotspot due to the development of generative pretrained transformer. However, compared to natural scene images, remote sensing images have a wide range of spatial scales, complex objects, and limited labelled samples, which introduce great challenges to image interpretation. To reduce the gap between nature scene images and remote sensing images, this paper proposes a novel RLita optimization method for foundation models. Specifically, a region-level image–text alignment optimization method is proposed to represent the features of images and texts as visual and sematic representation vectors in one embedding space for better model generalization, and a parameter-efficient tuning strategy is designed to reduce computational resources. Experiments on five remote sensing datasets including object detection, semantic segmentation, and change detection show the effectiveness of the RLita method. |
| format | Article |
| id | doaj-art-fed79724f79a4191b8720c3c8106be82 |
| institution | Kabale University |
| issn | 2072-4292 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Remote Sensing |
| spelling | doaj-art-fed79724f79a4191b8720c3c8106be822025-08-20T03:48:02ZengMDPI AGRemote Sensing2072-42922025-05-011710166110.3390/rs17101661RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation ModelQiang Zhang0Decheng Wang1Xiao Yu2Beijing Institute of Tracking and Telecommunication Technology, Beijing 100094, ChinaBeijing Institute of Tracking and Telecommunication Technology, Beijing 100094, ChinaBeijing Institute of Tracking and Telecommunication Technology, Beijing 100094, ChinaThe foundation model fine-tuning optimization method has gradually become a research hotspot due to the development of generative pretrained transformer. However, compared to natural scene images, remote sensing images have a wide range of spatial scales, complex objects, and limited labelled samples, which introduce great challenges to image interpretation. To reduce the gap between nature scene images and remote sensing images, this paper proposes a novel RLita optimization method for foundation models. Specifically, a region-level image–text alignment optimization method is proposed to represent the features of images and texts as visual and sematic representation vectors in one embedding space for better model generalization, and a parameter-efficient tuning strategy is designed to reduce computational resources. Experiments on five remote sensing datasets including object detection, semantic segmentation, and change detection show the effectiveness of the RLita method.https://www.mdpi.com/2072-4292/17/10/1661remote sensingfoundation model optimizationobject detectionsemantic segmentationchange detection |
| spellingShingle | Qiang Zhang Decheng Wang Xiao Yu RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model Remote Sensing remote sensing foundation model optimization object detection semantic segmentation change detection |
| title | RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model |
| title_full | RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model |
| title_fullStr | RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model |
| title_full_unstemmed | RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model |
| title_short | RLita: A Region-Level Image–Text Alignment Method for Remote Sensing Foundation Model |
| title_sort | rlita a region level image text alignment method for remote sensing foundation model |
| topic | remote sensing foundation model optimization object detection semantic segmentation change detection |
| url | https://www.mdpi.com/2072-4292/17/10/1661 |
| work_keys_str_mv | AT qiangzhang rlitaaregionlevelimagetextalignmentmethodforremotesensingfoundationmodel AT dechengwang rlitaaregionlevelimagetextalignmentmethodforremotesensingfoundationmodel AT xiaoyu rlitaaregionlevelimagetextalignmentmethodforremotesensingfoundationmodel |