News Image Captioning via Separate Attention on Entity Categories

News image captioning involves generating descriptive and informative captions for news images by utilizing news article context. This task aims to capture detailed information, including multiple types of named entities like person, organization, location, events etc. However, identifying named ent...

Full description

Saved in:
Bibliographic Details
Main Authors: Sonali Ajankar, Tanima Dutta
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11048780/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:News image captioning involves generating descriptive and informative captions for news images by utilizing news article context. This task aims to capture detailed information, including multiple types of named entities like person, organization, location, events etc. However, identifying named entities from an image is a challenging task; to address this, we propose a novel approach of categorizing the key named entities, into person entities and geoOrg entities, and providing distinct attention to these categories, which ensure the focused extraction of relevant information from the image. Despite this approach, a single news image falls short of providing comprehensive background details, leading to a lack of useful context. Nevertheless, it is possible that the missing context in one image is present in another image of the same article. To address this, we propose to incorporate the nearby image as an addon input, as structural proximity implies contextual relevance. This multimodal cue facilitates the accumulation of contextualized features that effectively capture contextually rich information from the article. Experimental results demonstrate the effectiveness of our proposed approach in the GoodNews, NYTimes800k and DM800K datasets for news image captioning, achieving an improvement of 0.4 BLEU-4 score over the state-of-the-art on the DM800K dataset.
ISSN:2169-3536