Graph-Based Hierarchical Semantic Consistency Network for Remote Sensing Image–Text Retrieval

Remote sensing image-text retrieval (RSITR) is becoming increasingly essential for the efficient utilization of remote sensing (RS) data. Nevertheless, current approaches primarily focus on individual feature extraction strategies for visual and textual modalities. They often lack effective feature...

Full description

Saved in:
Bibliographic Details
Main Authors: Meiting Wang, Jie Guo, Bin Song, Kangxiang Su
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11031116/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850110948193337344
author Meiting Wang
Jie Guo
Bin Song
Kangxiang Su
author_facet Meiting Wang
Jie Guo
Bin Song
Kangxiang Su
author_sort Meiting Wang
collection DOAJ
description Remote sensing image-text retrieval (RSITR) is becoming increasingly essential for the efficient utilization of remote sensing (RS) data. Nevertheless, current approaches primarily focus on individual feature extraction strategies for visual and textual modalities. They often lack effective feature aggregation strategies to fully leverage intramodal information integration and inter-modal information interactions, resulting in imprecise cross-modal feature alignment. In this article, we propose a novel graph-based hierarchical semantic consistency network, which enhances intramodal semantic associations through graph node communication and comprehensively explores the alignment of remote sensing images and texts by the designed Uni-modal Graph Aggregation (UGA) module and the Cross-modal Graph Aggregation (CGA) module. The UGA module adaptively integrates information with different semantic significance in each feature graph for accurate measurement of integral cross-modal semantic consistency. Furthermore, cross-modal information interactions are facilitated by the CGA module, which constructs cross-modal relevance graphs to infer the fine-grained cross-modal similarity. Extensive experiments on the RSICD and RSITMD datasets validate the superior performance of our model in the RSITR task.
format Article
id doaj-art-1dbb6e92b0a44c3ebb068e6db25479bf
institution OA Journals
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-1dbb6e92b0a44c3ebb068e6db25479bf2025-08-20T02:37:43ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118153341534610.1109/JSTARS.2025.357896211031116Graph-Based Hierarchical Semantic Consistency Network for Remote Sensing Image–Text RetrievalMeiting Wang0https://orcid.org/0009-0007-5777-0662Jie Guo1https://orcid.org/0000-0003-4975-0315Bin Song2https://orcid.org/0000-0002-8096-3370Kangxiang Su3Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, ChinaState Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, ChinaState Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, ChinaHangzhou Institute of Technology, Xidian University, Hangzhou, ChinaRemote sensing image-text retrieval (RSITR) is becoming increasingly essential for the efficient utilization of remote sensing (RS) data. Nevertheless, current approaches primarily focus on individual feature extraction strategies for visual and textual modalities. They often lack effective feature aggregation strategies to fully leverage intramodal information integration and inter-modal information interactions, resulting in imprecise cross-modal feature alignment. In this article, we propose a novel graph-based hierarchical semantic consistency network, which enhances intramodal semantic associations through graph node communication and comprehensively explores the alignment of remote sensing images and texts by the designed Uni-modal Graph Aggregation (UGA) module and the Cross-modal Graph Aggregation (CGA) module. The UGA module adaptively integrates information with different semantic significance in each feature graph for accurate measurement of integral cross-modal semantic consistency. Furthermore, cross-modal information interactions are facilitated by the CGA module, which constructs cross-modal relevance graphs to infer the fine-grained cross-modal similarity. Extensive experiments on the RSICD and RSITMD datasets validate the superior performance of our model in the RSITR task.https://ieeexplore.ieee.org/document/11031116/Cross-modal similarityfeature aggregationgraph neural networkshierarchical alignmentremote sensing image-text retrieval (RSITR)
spellingShingle Meiting Wang
Jie Guo
Bin Song
Kangxiang Su
Graph-Based Hierarchical Semantic Consistency Network for Remote Sensing Image–Text Retrieval
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Cross-modal similarity
feature aggregation
graph neural networks
hierarchical alignment
remote sensing image-text retrieval (RSITR)
title Graph-Based Hierarchical Semantic Consistency Network for Remote Sensing Image–Text Retrieval
title_full Graph-Based Hierarchical Semantic Consistency Network for Remote Sensing Image–Text Retrieval
title_fullStr Graph-Based Hierarchical Semantic Consistency Network for Remote Sensing Image–Text Retrieval
title_full_unstemmed Graph-Based Hierarchical Semantic Consistency Network for Remote Sensing Image–Text Retrieval
title_short Graph-Based Hierarchical Semantic Consistency Network for Remote Sensing Image–Text Retrieval
title_sort graph based hierarchical semantic consistency network for remote sensing image x2013 text retrieval
topic Cross-modal similarity
feature aggregation
graph neural networks
hierarchical alignment
remote sensing image-text retrieval (RSITR)
url https://ieeexplore.ieee.org/document/11031116/
work_keys_str_mv AT meitingwang graphbasedhierarchicalsemanticconsistencynetworkforremotesensingimagex2013textretrieval
AT jieguo graphbasedhierarchicalsemanticconsistencynetworkforremotesensingimagex2013textretrieval
AT binsong graphbasedhierarchicalsemanticconsistencynetworkforremotesensingimagex2013textretrieval
AT kangxiangsu graphbasedhierarchicalsemanticconsistencynetworkforremotesensingimagex2013textretrieval