Learning Manipulation from Expert Demonstrations Based on Multiple Data Associations and Physical Constraints
Abstract Learning from demonstration is widely regarded as a promising paradigm for robots to acquire diverse skills. Other than the artificial learning from observation-action pairs for machines, humans can learn to imitate in a more versatile and effective manner: acquiring skills through mere “ob...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2025-04-01
|
| Series: | Chinese Journal of Mechanical Engineering |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s10033-025-01204-y |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Learning from demonstration is widely regarded as a promising paradigm for robots to acquire diverse skills. Other than the artificial learning from observation-action pairs for machines, humans can learn to imitate in a more versatile and effective manner: acquiring skills through mere “observation”. Video to Command task is widely perceived as a promising approach for task-based learning, which yet faces two key challenges: (1) High redundancy and low frame rate of fine-grained action sequences make it difficult to manipulate objects robustly and accurately. (2) Video to Command models often prioritize accuracy and richness of output commands over physical capabilities, leading to impractical or unsafe instructions for robots. This article presents a novel Video to Command framework that employs multiple data associations and physical constraints. First, we introduce an object-level appearance-contrasting multiple data association strategy to effectively associate manipulated objects in visually complex environments, capturing dynamic changes in video content. Then, we propose a multi-task Video to Command model that utilizes object-level video content changes to compile expert demonstrations into manipulation commands. Finally, a multi-task hybrid loss function is proposed to train a Video to Command model that adheres to the constraints of the physical world and manipulation tasks. Our method achieved over 10% on BLEU_N, METEOR, ROUGE_L, and CIDEr compared to the up-to-date methods. The dual-arm robot prototype was established to demonstrate the whole process of learning from an expert demonstration of multiple skills and then executing the tasks by a robot. |
|---|---|
| ISSN: | 2192-8258 |