Learning Manipulation from Expert Demonstrations Based on Multiple Data Associations and Physical Constraints

Abstract Learning from demonstration is widely regarded as a promising paradigm for robots to acquire diverse skills. Other than the artificial learning from observation-action pairs for machines, humans can learn to imitate in a more versatile and effective manner: acquiring skills through mere “ob...

Full description

Saved in:
Bibliographic Details
Main Authors: Yangqing Ye, Yaojie Mao, Shiming Qiu, Chuan’guo Tang, Zhirui Pan, Weiwei Wan, Shibo Cai, Guanjun Bao
Format: Article
Language:English
Published: SpringerOpen 2025-04-01
Series:Chinese Journal of Mechanical Engineering
Subjects:
Online Access:https://doi.org/10.1186/s10033-025-01204-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849699759841869824
author Yangqing Ye
Yaojie Mao
Shiming Qiu
Chuan’guo Tang
Zhirui Pan
Weiwei Wan
Shibo Cai
Guanjun Bao
author_facet Yangqing Ye
Yaojie Mao
Shiming Qiu
Chuan’guo Tang
Zhirui Pan
Weiwei Wan
Shibo Cai
Guanjun Bao
author_sort Yangqing Ye
collection DOAJ
description Abstract Learning from demonstration is widely regarded as a promising paradigm for robots to acquire diverse skills. Other than the artificial learning from observation-action pairs for machines, humans can learn to imitate in a more versatile and effective manner: acquiring skills through mere “observation”. Video to Command task is widely perceived as a promising approach for task-based learning, which yet faces two key challenges: (1) High redundancy and low frame rate of fine-grained action sequences make it difficult to manipulate objects robustly and accurately. (2) Video to Command models often prioritize accuracy and richness of output commands over physical capabilities, leading to impractical or unsafe instructions for robots. This article presents a novel Video to Command framework that employs multiple data associations and physical constraints. First, we introduce an object-level appearance-contrasting multiple data association strategy to effectively associate manipulated objects in visually complex environments, capturing dynamic changes in video content. Then, we propose a multi-task Video to Command model that utilizes object-level video content changes to compile expert demonstrations into manipulation commands. Finally, a multi-task hybrid loss function is proposed to train a Video to Command model that adheres to the constraints of the physical world and manipulation tasks. Our method achieved over 10% on BLEU_N, METEOR, ROUGE_L, and CIDEr compared to the up-to-date methods. The dual-arm robot prototype was established to demonstrate the whole process of learning from an expert demonstration of multiple skills and then executing the tasks by a robot.
format Article
id doaj-art-1d5df8c687724ab685fd0171ccf138a0
institution DOAJ
issn 2192-8258
language English
publishDate 2025-04-01
publisher SpringerOpen
record_format Article
series Chinese Journal of Mechanical Engineering
spelling doaj-art-1d5df8c687724ab685fd0171ccf138a02025-08-20T03:18:30ZengSpringerOpenChinese Journal of Mechanical Engineering2192-82582025-04-0138111610.1186/s10033-025-01204-yLearning Manipulation from Expert Demonstrations Based on Multiple Data Associations and Physical ConstraintsYangqing Ye0Yaojie Mao1Shiming Qiu2Chuan’guo Tang3Zhirui Pan4Weiwei Wan5Shibo Cai6Guanjun Bao7College of Mechanical Engineering, Zhejiang University of TechnologyCollege of Mechanical Engineering, Zhejiang University of TechnologyCollege of Mechanical Engineering, Zhejiang University of TechnologyCollege of Mechanical Engineering, Zhejiang University of TechnologyCollege of Mechanical Engineering, Zhejiang University of TechnologyGraduate School of Engineering Science, Osaka UniversityCollege of Mechanical Engineering, Zhejiang University of TechnologyCollege of Mechanical Engineering, Zhejiang University of TechnologyAbstract Learning from demonstration is widely regarded as a promising paradigm for robots to acquire diverse skills. Other than the artificial learning from observation-action pairs for machines, humans can learn to imitate in a more versatile and effective manner: acquiring skills through mere “observation”. Video to Command task is widely perceived as a promising approach for task-based learning, which yet faces two key challenges: (1) High redundancy and low frame rate of fine-grained action sequences make it difficult to manipulate objects robustly and accurately. (2) Video to Command models often prioritize accuracy and richness of output commands over physical capabilities, leading to impractical or unsafe instructions for robots. This article presents a novel Video to Command framework that employs multiple data associations and physical constraints. First, we introduce an object-level appearance-contrasting multiple data association strategy to effectively associate manipulated objects in visually complex environments, capturing dynamic changes in video content. Then, we propose a multi-task Video to Command model that utilizes object-level video content changes to compile expert demonstrations into manipulation commands. Finally, a multi-task hybrid loss function is proposed to train a Video to Command model that adheres to the constraints of the physical world and manipulation tasks. Our method achieved over 10% on BLEU_N, METEOR, ROUGE_L, and CIDEr compared to the up-to-date methods. The dual-arm robot prototype was established to demonstrate the whole process of learning from an expert demonstration of multiple skills and then executing the tasks by a robot.https://doi.org/10.1186/s10033-025-01204-yVideos to commandMultiple data associationsMulti-task modelMulti-task hybrid loss functionPhysical constraints
spellingShingle Yangqing Ye
Yaojie Mao
Shiming Qiu
Chuan’guo Tang
Zhirui Pan
Weiwei Wan
Shibo Cai
Guanjun Bao
Learning Manipulation from Expert Demonstrations Based on Multiple Data Associations and Physical Constraints
Chinese Journal of Mechanical Engineering
Videos to command
Multiple data associations
Multi-task model
Multi-task hybrid loss function
Physical constraints
title Learning Manipulation from Expert Demonstrations Based on Multiple Data Associations and Physical Constraints
title_full Learning Manipulation from Expert Demonstrations Based on Multiple Data Associations and Physical Constraints
title_fullStr Learning Manipulation from Expert Demonstrations Based on Multiple Data Associations and Physical Constraints
title_full_unstemmed Learning Manipulation from Expert Demonstrations Based on Multiple Data Associations and Physical Constraints
title_short Learning Manipulation from Expert Demonstrations Based on Multiple Data Associations and Physical Constraints
title_sort learning manipulation from expert demonstrations based on multiple data associations and physical constraints
topic Videos to command
Multiple data associations
Multi-task model
Multi-task hybrid loss function
Physical constraints
url https://doi.org/10.1186/s10033-025-01204-y
work_keys_str_mv AT yangqingye learningmanipulationfromexpertdemonstrationsbasedonmultipledataassociationsandphysicalconstraints
AT yaojiemao learningmanipulationfromexpertdemonstrationsbasedonmultipledataassociationsandphysicalconstraints
AT shimingqiu learningmanipulationfromexpertdemonstrationsbasedonmultipledataassociationsandphysicalconstraints
AT chuanguotang learningmanipulationfromexpertdemonstrationsbasedonmultipledataassociationsandphysicalconstraints
AT zhiruipan learningmanipulationfromexpertdemonstrationsbasedonmultipledataassociationsandphysicalconstraints
AT weiweiwan learningmanipulationfromexpertdemonstrationsbasedonmultipledataassociationsandphysicalconstraints
AT shibocai learningmanipulationfromexpertdemonstrationsbasedonmultipledataassociationsandphysicalconstraints
AT guanjunbao learningmanipulationfromexpertdemonstrationsbasedonmultipledataassociationsandphysicalconstraints