Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning

In few-shot scenarios, the lack of caption-labeled samples and prior knowledge leads to insufficient training and performance degradation of remote sensing image captioning (RC) models. We propose an iterative remote sensing image captioning method named IRIC to promote RC model performance iteratio...

Full description

Saved in:

Bibliographic Details
Main Authors:	Haonan Zhou, Hang Tang, Xiangchun Liu, Xiaoxiao Shi, Lurui Xia
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2025-08-01
Series:	International Journal of Digital Earth
Subjects:	Remote sensing image captioning remote sensing text-to-image generation contrastive learning caption-labeled sample amplification
Online Access:	https://www.tandfonline.com/doi/10.1080/17538947.2025.2526102
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849224357312724992
author	Haonan Zhou Hang Tang Xiangchun Liu Xiaoxiao Shi Lurui Xia
author_facet	Haonan Zhou Hang Tang Xiangchun Liu Xiaoxiao Shi Lurui Xia
author_sort	Haonan Zhou
collection	DOAJ
description	In few-shot scenarios, the lack of caption-labeled samples and prior knowledge leads to insufficient training and performance degradation of remote sensing image captioning (RC) models. We propose an iterative remote sensing image captioning method named IRIC to promote RC model performance iteration and generate higher quality captions. The IRIC first constructs a remote sensing text-to-image model CRTI based on contrastive learning, which can generate remote sensing images with the same semantic content from text and achieve text-driven remote sensing image transformation; Subsequently, caption-labeled sample amplification with prior knowledge introduction is implemented, which incorporates prior knowledge into the text-driven remote sensing image transformation to achieve caption-labeled sample amplification; Finally, the amplified caption-labeled samples are added to the original train set, and the RC model is retrained to achieve iterative performance improvement. The experimental results show that the IRIC is highly effective in few-shot scenarios and can iteratively improve the CIDEr scores of the latest few-shot RC model by 8.5%.
format	Article
id	doaj-art-4b80a9b4756c4a8d9706f854e062efd1
institution	Kabale University
issn	1753-8947 1753-8955
language	English
publishDate	2025-08-01
publisher	Taylor & Francis Group
record_format	Article
series	International Journal of Digital Earth
spelling	doaj-art-4b80a9b4756c4a8d9706f854e062efd12025-08-25T11:25:06ZengTaylor & Francis GroupInternational Journal of Digital Earth1753-89471753-89552025-08-0118110.1080/17538947.2025.2526102Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioningHaonan Zhou0Hang Tang1Xiangchun Liu2Xiaoxiao Shi3Lurui Xia4Beijing Institute of Remote Sensing Information, Beijing, People’s Republic of ChinaBeijing Institute of Remote Sensing Information, Beijing, People’s Republic of ChinaBeijing Institute of Remote Sensing Information, Beijing, People’s Republic of ChinaBeijing Institute of Remote Sensing Information, Beijing, People’s Republic of ChinaSpace Engineering University, Beijing, People’s Republic of ChinaIn few-shot scenarios, the lack of caption-labeled samples and prior knowledge leads to insufficient training and performance degradation of remote sensing image captioning (RC) models. We propose an iterative remote sensing image captioning method named IRIC to promote RC model performance iteration and generate higher quality captions. The IRIC first constructs a remote sensing text-to-image model CRTI based on contrastive learning, which can generate remote sensing images with the same semantic content from text and achieve text-driven remote sensing image transformation; Subsequently, caption-labeled sample amplification with prior knowledge introduction is implemented, which incorporates prior knowledge into the text-driven remote sensing image transformation to achieve caption-labeled sample amplification; Finally, the amplified caption-labeled samples are added to the original train set, and the RC model is retrained to achieve iterative performance improvement. The experimental results show that the IRIC is highly effective in few-shot scenarios and can iteratively improve the CIDEr scores of the latest few-shot RC model by 8.5%.https://www.tandfonline.com/doi/10.1080/17538947.2025.2526102Remote sensing image captioningremote sensing text-to-image generationcontrastive learningcaption-labeled sample amplification
spellingShingle	Haonan Zhou Hang Tang Xiangchun Liu Xiaoxiao Shi Lurui Xia Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning International Journal of Digital Earth Remote sensing image captioning remote sensing text-to-image generation contrastive learning caption-labeled sample amplification
title	Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning
title_full	Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning
title_fullStr	Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning
title_full_unstemmed	Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning
title_short	Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning
title_sort	contrastive learning based remote sensing text to image generation for few shot remote sensing image captioning
topic	Remote sensing image captioning remote sensing text-to-image generation contrastive learning caption-labeled sample amplification
url	https://www.tandfonline.com/doi/10.1080/17538947.2025.2526102
work_keys_str_mv	AT haonanzhou contrastivelearningbasedremotesensingtexttoimagegenerationforfewshotremotesensingimagecaptioning AT hangtang contrastivelearningbasedremotesensingtexttoimagegenerationforfewshotremotesensingimagecaptioning AT xiangchunliu contrastivelearningbasedremotesensingtexttoimagegenerationforfewshotremotesensingimagecaptioning AT xiaoxiaoshi contrastivelearningbasedremotesensingtexttoimagegenerationforfewshotremotesensingimagecaptioning AT luruixia contrastivelearningbasedremotesensingtexttoimagegenerationforfewshotremotesensingimagecaptioning

Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning

Similar Items