GPT-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models

In the field of autonomous vehicles (AVs), accurately discerning commander intent and executing linguistic commands within a visual context presents a significant challenge. This paper introduces a sophisticated encoder-decoder framework, developed to address visual grounding in AVs. Our Context-Awa...

Full description

Saved in:
Bibliographic Details
Main Authors: Haicheng Liao, Huanming Shen, Zhenning Li, Chengyue Wang, Guofa Li, Yiming Bie, Chengzhong Xu
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Communications in Transportation Research
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772424723000276
Tags: Add Tag
No Tags, Be the first to tag this record!