An Ensemble of Vision-Language Transformer-Based Captioning Model With Rotatory Positional Embeddings
Image captioning is a dynamic and crucial research area focused on automatically generating image textual descriptions. Traditional models, primarily employing an encoder-decoder framework with Convolutional Neural Networks (CNNs), often struggle to capture the complex spatial and sequential relatio...
Saved in:
| Main Authors: | K. B. Sathyanarayana, Dinesh Naik |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10946097/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks
by: Cunhe Li, et al.
Published: (2025-05-01) -
ADFCNN-BiLSTM: A Deep Neural Network Based on Attention and Deformable Convolution for Network Intrusion Detection
by: Bin Li, et al.
Published: (2025-02-01) -
SAGCN: Self-Attention Graph Convolutional Network for Human Pose Embedding
by: Zhongxiong Xu, et al.
Published: (2025-01-01) -
Hyperband-Optimized CNN-BiLSTM with Attention Mechanism for Corporate Financial Distress Prediction
by: Yingying Song, et al.
Published: (2025-05-01) -
Integrating visual memory for image captioning
by: Jiahui Wei, et al.
Published: (2025-05-01)