Dynamics Learning With Object-Centric Interaction Networks for Robot Manipulation

Understanding the physical interactions of objects with environments is critical for multi-object robotic manipulation tasks. A predictive dynamics model can predict the future states of manipulated objects, which is used to plan plausible actions that enable the objects to achieve desired goal stat...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiayu Wang, Chuxiong Hu, Yunan Wang, Yu Zhu
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9420758/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849762721314111488
author Jiayu Wang
Chuxiong Hu
Yunan Wang
Yu Zhu
author_facet Jiayu Wang
Chuxiong Hu
Yunan Wang
Yu Zhu
author_sort Jiayu Wang
collection DOAJ
description Understanding the physical interactions of objects with environments is critical for multi-object robotic manipulation tasks. A predictive dynamics model can predict the future states of manipulated objects, which is used to plan plausible actions that enable the objects to achieve desired goal states. However, most current approaches on dynamics learning from high-dimensional visual observations have limitations. These methods either rely on a large amount of real-world data or build a model with a fixed number of objects, which makes them difficult to generalize to unseen objects. This paper proposes a Deep Object-centric Interaction Network (DOIN) which encodes object-centric representations for multiple objects from raw RGB images and reasons about the future trajectory for each object in latent space. The proposed model is trained only on large amounts of random interaction data collected in simulation. The learned model combined with a model predictive control framework enables a robot to search action sequences that manipulate objects to the desired configurations. The proposed method is evaluated both in simulation and real-world experiments on multi-object pushing tasks. Extensive simulation experiments show that DOIN can achieve high prediction accuracy in different scenes with different numbers of objects and outperform state-of-the-art baselines in the manipulation tasks. Real-world experiments demonstrate that the model trained on simulated data can be transferred to the real robot and can successfully perform multi-object pushing tasks for previously-unseen objects with significant variations in shape and size.
format Article
id doaj-art-c4db079b6b1a4755819f77d66fc893b3
institution DOAJ
issn 2169-3536
language English
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c4db079b6b1a4755819f77d66fc893b32025-08-20T03:05:39ZengIEEEIEEE Access2169-35362021-01-019682776828810.1109/ACCESS.2021.30771179420758Dynamics Learning With Object-Centric Interaction Networks for Robot ManipulationJiayu Wang0https://orcid.org/0000-0001-8367-258XChuxiong Hu1https://orcid.org/0000-0002-3504-3065Yunan Wang2https://orcid.org/0000-0002-9812-7709Yu Zhu3Department of Mechanical Engineering, State Key Laboratory of Tribology, Tsinghua University, Beijing, ChinaDepartment of Mechanical Engineering, State Key Laboratory of Tribology, Tsinghua University, Beijing, ChinaDepartment of Mechanical Engineering, State Key Laboratory of Tribology, Tsinghua University, Beijing, ChinaDepartment of Mechanical Engineering, State Key Laboratory of Tribology, Tsinghua University, Beijing, ChinaUnderstanding the physical interactions of objects with environments is critical for multi-object robotic manipulation tasks. A predictive dynamics model can predict the future states of manipulated objects, which is used to plan plausible actions that enable the objects to achieve desired goal states. However, most current approaches on dynamics learning from high-dimensional visual observations have limitations. These methods either rely on a large amount of real-world data or build a model with a fixed number of objects, which makes them difficult to generalize to unseen objects. This paper proposes a Deep Object-centric Interaction Network (DOIN) which encodes object-centric representations for multiple objects from raw RGB images and reasons about the future trajectory for each object in latent space. The proposed model is trained only on large amounts of random interaction data collected in simulation. The learned model combined with a model predictive control framework enables a robot to search action sequences that manipulate objects to the desired configurations. The proposed method is evaluated both in simulation and real-world experiments on multi-object pushing tasks. Extensive simulation experiments show that DOIN can achieve high prediction accuracy in different scenes with different numbers of objects and outperform state-of-the-art baselines in the manipulation tasks. Real-world experiments demonstrate that the model trained on simulated data can be transferred to the real robot and can successfully perform multi-object pushing tasks for previously-unseen objects with significant variations in shape and size.https://ieeexplore.ieee.org/document/9420758/Deep learning in robotic manipulationmodel learningrepresentation learningvisual learning
spellingShingle Jiayu Wang
Chuxiong Hu
Yunan Wang
Yu Zhu
Dynamics Learning With Object-Centric Interaction Networks for Robot Manipulation
IEEE Access
Deep learning in robotic manipulation
model learning
representation learning
visual learning
title Dynamics Learning With Object-Centric Interaction Networks for Robot Manipulation
title_full Dynamics Learning With Object-Centric Interaction Networks for Robot Manipulation
title_fullStr Dynamics Learning With Object-Centric Interaction Networks for Robot Manipulation
title_full_unstemmed Dynamics Learning With Object-Centric Interaction Networks for Robot Manipulation
title_short Dynamics Learning With Object-Centric Interaction Networks for Robot Manipulation
title_sort dynamics learning with object centric interaction networks for robot manipulation
topic Deep learning in robotic manipulation
model learning
representation learning
visual learning
url https://ieeexplore.ieee.org/document/9420758/
work_keys_str_mv AT jiayuwang dynamicslearningwithobjectcentricinteractionnetworksforrobotmanipulation
AT chuxionghu dynamicslearningwithobjectcentricinteractionnetworksforrobotmanipulation
AT yunanwang dynamicslearningwithobjectcentricinteractionnetworksforrobotmanipulation
AT yuzhu dynamicslearningwithobjectcentricinteractionnetworksforrobotmanipulation