Semantic-Guided Selective Representation for Image Captioning

Grid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues, namely, noisy features and fragmented semantics. In this paper, we propose a novel feature...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yinan Li, Yiwei Ma, Yiyi Zhou, Xiao Yu
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Fine-grained semantic guidance relation-aware selection image captioning
Online Access:	https://ieeexplore.ieee.org/document/10041895/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Grid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues, namely, noisy features and fragmented semantics. In this paper, we propose a novel feature selection scheme, with a Relation-Aware Selection (RAS) and a Fine-grained Semantic Guidance (FSG) learning strategy. Based on the grid-wise interactions, RAS can enhance the salient visual regions and channels, and suppress the less important ones. In addition, this selection process is guided by FSG, which uses fine-grained semantic knowledge to supervise the selection process. Experimental results on the MS COCO show the proposed RAS-FSG scheme achieves state-of-the-art performance on both the off-line and on-line testing, i.e., 134.3 CIDEr for the off-line testing and 135.4 for the on-line testing of MSCOCO. Extensive ablation studies and visualizations also validate the effectiveness of our scheme.
ISSN:	2169-3536

Semantic-Guided Selective Representation for Image Captioning

Similar Items