DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios

Traditional Vision-and-Language Navigation (VLN) tasks require an agent to navigate static environments using natural language instructions. However, real-world road conditions such as vehicle movements, traffic signal fluctuations, pedestrian activity, and weather variations are dynamic and continu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yanjun Sun, Yue Qiu, Yoshimitsu Aoki
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Sensors
Subjects:	vision-and-language navigation dynamic change decision-making
Online Access:	https://www.mdpi.com/1424-8220/25/2/364
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832587525533728768
author	Yanjun Sun Yue Qiu Yoshimitsu Aoki
author_facet	Yanjun Sun Yue Qiu Yoshimitsu Aoki
author_sort	Yanjun Sun
collection	DOAJ
description	Traditional Vision-and-Language Navigation (VLN) tasks require an agent to navigate static environments using natural language instructions. However, real-world road conditions such as vehicle movements, traffic signal fluctuations, pedestrian activity, and weather variations are dynamic and continually changing. These factors significantly impact an agent’s decision-making ability, underscoring the limitations of current VLN models, which do not accurately reflect the complexities of real-world navigation. To bridge this gap, we propose a novel task called Dynamic Vision-and-Language Navigation (DynamicVLN), incorporating various dynamic scenarios to enhance the agent’s decision-making abilities and adaptability. By redefining the VLN task, we emphasize that a robust and generalizable agent should not rely solely on predefined instructions but must also demonstrate reasoning skills and adaptability to unforeseen events. Specifically, we have designed ten scenarios that simulate the challenges of dynamic navigation and developed a dedicated dataset of 11,261 instances using the CARLA simulator (ver.0.9.13) and large language model to provide realistic training conditions. Additionally, we introduce a baseline model that integrates advanced perception and decision-making modules, enabling effective navigation and interpretation of the complexities of dynamic road conditions. This model showcases the ability to follow natural language instructions while dynamically adapting to environmental cues. Our approach establishes a benchmark for developing agents capable of functioning in real-world, dynamic environments and extending beyond the limitations of static VLN tasks to more practical and versatile applications.
format	Article
id	doaj-art-c7d204d51c304880b75bdadb5d5a4482
institution	Kabale University
issn	1424-8220
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-c7d204d51c304880b75bdadb5d5a44822025-01-24T13:48:39ZengMDPI AGSensors1424-82202025-01-0125236410.3390/s25020364DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation ScenariosYanjun Sun0Yue Qiu1Yoshimitsu Aoki2Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, JapanNational Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Umezono, Tsukuba 305-8560, JapanDepartment of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, JapanTraditional Vision-and-Language Navigation (VLN) tasks require an agent to navigate static environments using natural language instructions. However, real-world road conditions such as vehicle movements, traffic signal fluctuations, pedestrian activity, and weather variations are dynamic and continually changing. These factors significantly impact an agent’s decision-making ability, underscoring the limitations of current VLN models, which do not accurately reflect the complexities of real-world navigation. To bridge this gap, we propose a novel task called Dynamic Vision-and-Language Navigation (DynamicVLN), incorporating various dynamic scenarios to enhance the agent’s decision-making abilities and adaptability. By redefining the VLN task, we emphasize that a robust and generalizable agent should not rely solely on predefined instructions but must also demonstrate reasoning skills and adaptability to unforeseen events. Specifically, we have designed ten scenarios that simulate the challenges of dynamic navigation and developed a dedicated dataset of 11,261 instances using the CARLA simulator (ver.0.9.13) and large language model to provide realistic training conditions. Additionally, we introduce a baseline model that integrates advanced perception and decision-making modules, enabling effective navigation and interpretation of the complexities of dynamic road conditions. This model showcases the ability to follow natural language instructions while dynamically adapting to environmental cues. Our approach establishes a benchmark for developing agents capable of functioning in real-world, dynamic environments and extending beyond the limitations of static VLN tasks to more practical and versatile applications.https://www.mdpi.com/1424-8220/25/2/364vision-and-language navigationdynamic changedecision-making
spellingShingle	Yanjun Sun Yue Qiu Yoshimitsu Aoki DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios Sensors vision-and-language navigation dynamic change decision-making
title	DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
title_full	DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
title_fullStr	DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
title_full_unstemmed	DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
title_short	DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
title_sort	dynamicvln incorporating dynamics into vision and language navigation scenarios
topic	vision-and-language navigation dynamic change decision-making
url	https://www.mdpi.com/1424-8220/25/2/364
work_keys_str_mv	AT yanjunsun dynamicvlnincorporatingdynamicsintovisionandlanguagenavigationscenarios AT yueqiu dynamicvlnincorporatingdynamicsintovisionandlanguagenavigationscenarios AT yoshimitsuaoki dynamicvlnincorporatingdynamicsintovisionandlanguagenavigationscenarios

DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios

Similar Items