Image region semantic enhancement and symmetric semantic completion for text-to-image person search
Abstract Mask learning has emerged as a promising approach for Text-to-Image Person Search (TIPS), yet it faces two key challenges: (1) There tends to be semantic inconsistency between image regions and text phrases. (2) Current approaches primarily focus on masking text tokens to facilitate cross-m...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Online Access: | https://doi.org/10.1038/s41598-025-00904-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849399658081681408 |
|---|---|
| author | Ting Tuo Lijun Guo Rong Zhang Yirui Wang Jiangbo Qian Shangce Gao |
| author_facet | Ting Tuo Lijun Guo Rong Zhang Yirui Wang Jiangbo Qian Shangce Gao |
| author_sort | Ting Tuo |
| collection | DOAJ |
| description | Abstract Mask learning has emerged as a promising approach for Text-to-Image Person Search (TIPS), yet it faces two key challenges: (1) There tends to be semantic inconsistency between image regions and text phrases. (2) Current approaches primarily focus on masking text tokens to facilitate cross-modal alignment, overlooking the important role that text plays in guiding the learning of intricate details within images, which can lead to missed opportunities for capturing these details. In this paper, we are excited to introduce our proposed method called Image Region Semantic Enhancement and Symmetric Semantic Completion (RE-SSC). Specifically, our approach comprises two main components: Image Region Semantic Enhancement (IRSE) and Symmetric Semantic Completion (SSC). In IRSE, we initially apply superpixel segmentation to partition images into distinct patches based on low-level semantics. Subsequently, we leverage self-supervised consistency learning to transfer high-level semantic information from the global context of the image for local patches, enhancing local patch semantics. Within the SSC component, we have designed a symmetric semantic completion learning process that operates in both textual and visual directions, emphasizing global as well as local token learning to achieve effective alignment across modalities. We evaluated our method on three public datasets and are pleased to report competitive performance in addressing text-to-image pedestrian searches. |
| format | Article |
| id | doaj-art-b4b035a3ac7b470ebd6de7898c2e100b |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-b4b035a3ac7b470ebd6de7898c2e100b2025-08-20T03:38:16ZengNature PortfolioScientific Reports2045-23222025-07-0115111610.1038/s41598-025-00904-8Image region semantic enhancement and symmetric semantic completion for text-to-image person searchTing Tuo0Lijun Guo1Rong Zhang2Yirui Wang3Jiangbo Qian4Shangce Gao5Faculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Engineering, University of ToyamaAbstract Mask learning has emerged as a promising approach for Text-to-Image Person Search (TIPS), yet it faces two key challenges: (1) There tends to be semantic inconsistency between image regions and text phrases. (2) Current approaches primarily focus on masking text tokens to facilitate cross-modal alignment, overlooking the important role that text plays in guiding the learning of intricate details within images, which can lead to missed opportunities for capturing these details. In this paper, we are excited to introduce our proposed method called Image Region Semantic Enhancement and Symmetric Semantic Completion (RE-SSC). Specifically, our approach comprises two main components: Image Region Semantic Enhancement (IRSE) and Symmetric Semantic Completion (SSC). In IRSE, we initially apply superpixel segmentation to partition images into distinct patches based on low-level semantics. Subsequently, we leverage self-supervised consistency learning to transfer high-level semantic information from the global context of the image for local patches, enhancing local patch semantics. Within the SSC component, we have designed a symmetric semantic completion learning process that operates in both textual and visual directions, emphasizing global as well as local token learning to achieve effective alignment across modalities. We evaluated our method on three public datasets and are pleased to report competitive performance in addressing text-to-image pedestrian searches.https://doi.org/10.1038/s41598-025-00904-8 |
| spellingShingle | Ting Tuo Lijun Guo Rong Zhang Yirui Wang Jiangbo Qian Shangce Gao Image region semantic enhancement and symmetric semantic completion for text-to-image person search Scientific Reports |
| title | Image region semantic enhancement and symmetric semantic completion for text-to-image person search |
| title_full | Image region semantic enhancement and symmetric semantic completion for text-to-image person search |
| title_fullStr | Image region semantic enhancement and symmetric semantic completion for text-to-image person search |
| title_full_unstemmed | Image region semantic enhancement and symmetric semantic completion for text-to-image person search |
| title_short | Image region semantic enhancement and symmetric semantic completion for text-to-image person search |
| title_sort | image region semantic enhancement and symmetric semantic completion for text to image person search |
| url | https://doi.org/10.1038/s41598-025-00904-8 |
| work_keys_str_mv | AT tingtuo imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT lijunguo imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT rongzhang imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT yiruiwang imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT jiangboqian imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT shangcegao imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch |