Integrating visual memory for image captioning

Integrating visual memory for image captioning

Abstract Most existing image captioning models use region-level features extracted by object detectors as input and obtain advanced performance. However, although region features provide high-level semantic information, it is still limited by their local nature and detector performance and inevitabl...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jiahui Wei, Tongtong Wu
Format:	Article
Language:	English
Published:	Springer 2025-05-01
Series:	Discover Applied Sciences
Subjects:	Image captioning Memory mechanism Transformer Attention
Online Access:	https://doi.org/10.1007/s42452-025-07045-7
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Image Captioning Based on Semantic Scenes
by: Fengzhi Zhao, et al.
Published: (2024-10-01)

Remote Sensing Image Change Captioning Using Multi-Attentive Network with Diffusion Model
by: Yue Yang, et al.
Published: (2024-11-01)

Chinese Image Captioning Based on Deep Fusion Feature and Multi-Layer Feature Filtering Block
by: Xi Yang, et al.
Published: (2025-01-01)

Dual-Stream Spatially Aware Transformer for Remote Sensing Image Captioning
by: Haifeng Sima, et al.
Published: (2025-01-01)

PBC-Transformer: Interpreting Poultry Behavior Classification Using Image Caption Generation Techniques
by: Jun Li, et al.
Published: (2025-05-01)

Tiny TR-CAP: A novel small-scale benchmark dataset for general-purpose image captioning tasks
by: Abbas Memiş, et al.
Published: (2025-04-01)

Semantic–Spatial Feature Fusion With Dynamic Graph Refinement for Remote Sensing Image Captioning
by: Maofu Liu, et al.
Published: (2025-01-01)

Improved IEC performance via emotional stimuli-aware captioning
by: Zibo Zhou, et al.
Published: (2025-07-01)

A novel image captioning model with visual-semantic similarities and visual representations re-weighting
by: Alaa Thobhani, et al.
Published: (2024-09-01)

A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow
by: Zhenqiang Zhao, et al.
Published: (2025-06-01)

Improving Visual Question Answering by Image Captioning
by: Xiangjun Shao, et al.
Published: (2025-01-01)

Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption
by: Agus Nursikuwagus, et al.
Published: (2024-01-01)

Feature refinement and rethinking attention for remote sensing image captioning
by: Yunpeng Li, et al.
Published: (2025-03-01)

Affective Image Captioning for Visual Artworks Using Emotion-Based Cross-Attention Mechanisms
by: Shintaro Ishikawa, et al.
Published: (2023-01-01)

Thangka image captioning model with Salient Attention and Local Interaction Aggregator
by: Wenjin Hu, et al.
Published: (2024-11-01)

Integrating Abstract Meaning Representation to Enhance Transformer-Based Image Captioning
by: Nguyen Van Thinh, et al.
Published: (2025-01-01)

Enhanced group relation learning via aligned attention masking for fashion product captioning
by: Yuhao Tang, et al.
Published: (2025-08-01)

Novel Advance Image Caption Generation Utilizing Vision Transformer and Generative Adversarial Networks
by: Shourya Tyagi, et al.
Published: (2024-11-01)

A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
by: Ke Zhang, et al.
Published: (2024-11-01)

Auto-Scenario Generator for Autonomous Vehicle Safety: Multi-Modal Attention-Based Image Captioning Model Using Digital Twin Data
by: Hojun Lee, et al.
Published: (2024-01-01)

Attribute-Based Learning for Remote Sensing Image Captioning in Unseen Scenes
by: Zhang Guo, et al.
Published: (2025-03-01)

MFEAM: Multi-View Feature Enhanced Attention Model for Image Captioning
by: Yang Cui, et al.
Published: (2025-07-01)

Offline visual aid system for the blind based on image captioning
by: Yue CHEN, et al.
Published: (2022-01-01)

Contrastive learning based remote sensing text-to-image generation for few-shot remote sensing image captioning
by: Haonan Zhou, et al.
Published: (2025-08-01)

Semantic-Guided Selective Representation for Image Captioning
by: Yinan Li, et al.
Published: (2023-01-01)

Research on Digital Media Art for Image Caption Generation Based on Integrated Transformer Models in CLIP
by: Lu Gao, et al.
Published: (2025-01-01)

Combining Region-Guided Attention and Attribute Prediction for Thangka Image Captioning Method
by: Fujun Zhang, et al.
Published: (2025-01-01)

An Ensemble of Vision-Language Transformer-Based Captioning Model With Rotatory Positional Embeddings
by: K. B. Sathyanarayana, et al.
Published: (2025-01-01)

The CLIP - GPT Image Captioning Model Integrated with Global Semantics
by: TAO Rui, et al.
Published: (2024-04-01)

Visual Content Captioning and Audio Conversion using CNN-RNN with Attention Model
by: Aldy Agil Hermanto, et al.
Published: (2025-06-01)

A Patch-Level Region-Aware Module with a Multi-Label Framework for Remote Sensing Image Captioning
by: Yunpeng Li, et al.
Published: (2024-10-01)

Detailed Image Captioning and Hashtag Generation
by: Nikshep Shetty, et al.
Published: (2024-11-01)

Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation
by: Enas Abbas Abed, et al.
Published: (2025-06-01)

Systematic Literature Review on Medical Image Captioning Using CNN-LSTM and Transformer-Based Models
by: Husni Fadhilah, et al.
Published: (2025-05-01)

AVCaps: An Audio-Visual Dataset With Modality-Specific Captions
by: Parthasaarathy Sudarsanam, et al.
Published: (2025-01-01)

DCAT: A Novel Transformer-Based Approach for Dynamic Context-Aware Image Captioning in the Tamil Language
by: Jothi Prakash Venugopal, et al.
Published: (2025-04-01)

Preliminary Study on Image Captioning for Construction Hazards
by: Wen-Ta Hsiao, et al.
Published: (2024-08-01)

Fit for What Purpose? NER Certification of Automatic Captions in English and Spanish
by: Pablo Romero-Fresco, et al.
Published: (2025-01-01)

Enhanced CLIP-GPT Framework for Cross-Lingual Remote Sensing Image Captioning
by: Rui Song, et al.
Published: (2025-01-01)

NLP-Based Fusion Approach to Robust Image Captioning
by: Riccardo Ricci, et al.
Published: (2024-01-01)