Combining Prior Knowledge and Reinforcement Learning for Parallel Telescopic-Legged Bipedal Robot Walking

The parallel dual-slider telescopic leg bipedal robot (L04) is characterized by its simple structure and low leg rotational inertia, which contribute to its walking efficiency. However, end-to-end methods often overlook the robot’s physical structure, leading to difficulties in maintaining the paral...

Full description

Saved in:
Bibliographic Details
Main Authors: Jie Xue, Jiaqi Huangfu, Yunfeng Hou, Haiming Mou
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/6/979
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The parallel dual-slider telescopic leg bipedal robot (L04) is characterized by its simple structure and low leg rotational inertia, which contribute to its walking efficiency. However, end-to-end methods often overlook the robot’s physical structure, leading to difficulties in maintaining the parallel alignment of the dual sliders, which in turn compromises walking stability. One potential solution to this issue involves utilizing imitation learning to replicate human motion data. However, the dual telescopic leg structure of the L04 robot makes it difficult to perform motion retargeting of human motion data. To enable L04 walking, we design a method that integrates prior feedforward with reinforcement learning (PFRL), specifically tailored for the parallel dual-slider structure. We utilize prior knowledge as a feedforward action to compensate for system nonlinearities; meanwhile, the feedback action generated by the policy network adaptively regulates dynamic balance and, combined with the feedforward action, jointly controls the robot’s walking. PFRL enforces constraints within the motion space to mitigate the chaotic behavior of the parallel dual sliders. Experimental results show that our method successfully achieves sim2real transfer on a real bipedal robot without the need for domain randomization techniques or intricate reward functions. L04 achieves omnidirectional walking with minimal energy consumption and exhibits robustness against external disturbances.
ISSN:2227-7390