Review and emerging trends of embodied agent based on multimodal large language models

Embodied agents refer to intelligent entities capable of completing one or multiple tasks based on instructions and possessing the ability to interact with the physical environment. These agents have immense potential applications across various fields, such as service robotics, intelligent educatio...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHAO Botao, KANG Zuheng, QU Xiaoyang, PENG Junqing, ZHANG Xulong, WANG Jianzong
Format: Article
Language:zho
Published: China InfoCom Media Group 2025-05-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025035/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Embodied agents refer to intelligent entities capable of completing one or multiple tasks based on instructions and possessing the ability to interact with the physical environment. These agents have immense potential applications across various fields, such as service robotics, intelligent education, and assistive healthcare, and represent a crucial pathway toward realizing general-purpose robots. With the advancement of multimodal large language models, embodied agents possess enhanced abilities in natural language understanding, reasoning, and environmental perception, significantly accelerating progress in this domain. Although many outstanding works have emerged in recent years, the field still lacks comprehensive surveys and targeted evaluations. To help researchers quickly and thoroughly know the developments in this area, in-depth review and analysis were conducted. Multimodal large language models were introducted, followed by datasets and a review of the physical carriers used for constructing embodied intelligent agents. Then, three key research directions are analyzed, including embodied large models, high-level task planning, and low-level action control. Finally, the challenges and limitations of embodied agents were summarized and potential future directions were explored. This review serves as a foundational reference for the research community and fosters further development and innovation in the field.
ISSN:2096-0271