HPRS: hierarchical potential-based reward shaping from task specifications
The automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Seve...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-02-01
|
Series: | Frontiers in Robotics and AI |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823860916298448896 |
---|---|
author | Luigi Berducci Edgar A. Aguilar Dejan Ničković Radu Grosu |
author_facet | Luigi Berducci Edgar A. Aguilar Dejan Ničković Radu Grosu |
author_sort | Luigi Berducci |
collection | DOAJ |
description | The automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Several works have been developing automated reward definitions from formal requirements, but they show limitations in producing a signal which is both effective in training and able to fulfill multiple heterogeneous requirements. In this paper, we define a task as a partially ordered set of safety, target, and comfort requirements and introduce an automated methodology to enforce a natural order among requirements into the reward signal. We perform this by automatically translating the requirements into a sum of safety, target, and comfort rewards, where the target reward is a function of the safety reward and the comfort reward is a function of the safety and target rewards. Using a potential-based formulation, we enhance sparse to dense rewards and formally prove this to maintain policy optimality. We call our novel approach hierarchical, potential-based reward shaping (HPRS). Our experiments on eight robotics benchmarks demonstrate that HPRS is able to generate policies satisfying complex hierarchical requirements. Moreover, compared with the state of the art, HPRS achieves faster convergence and superior performance with respect to the rank-preserving policy-assessment metric. By automatically balancing competing requirements, HPRS produces task-satisfying policies with improved comfort and without manual parameter tuning. Through ablation studies, we analyze the impact of individual requirement classes on emergent behavior. Our experiments show that HPRS benefits from comfort requirements when aligned with the target and safety and ignores them when in conflict with the safety or target requirements. Finally, we validate the practical usability of HPRS in real-world robotics applications, including two sim-to-real experiments using F1TENTH vehicles. These experiments show that a hierarchical design of task specifications facilitates the sim-to-real transfer without any domain adaptation. |
format | Article |
id | doaj-art-a8fdabe093aa4579aad3391aeb3c7179 |
institution | Kabale University |
issn | 2296-9144 |
language | English |
publishDate | 2025-02-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Robotics and AI |
spelling | doaj-art-a8fdabe093aa4579aad3391aeb3c71792025-02-10T08:58:21ZengFrontiers Media S.A.Frontiers in Robotics and AI2296-91442025-02-011110.3389/frobt.2024.14441881444188HPRS: hierarchical potential-based reward shaping from task specificationsLuigi Berducci0Edgar A. Aguilar1Dejan Ničković2Radu Grosu3Cyber-Physical Systems Group, Computer Engineering, TU Wien, Vienna, AustriaCenter for Digital Safety and Security, AIT Austrian Institute of Technology GmbH, Vienna, AustriaCenter for Digital Safety and Security, AIT Austrian Institute of Technology GmbH, Vienna, AustriaCyber-Physical Systems Group, Computer Engineering, TU Wien, Vienna, AustriaThe automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Several works have been developing automated reward definitions from formal requirements, but they show limitations in producing a signal which is both effective in training and able to fulfill multiple heterogeneous requirements. In this paper, we define a task as a partially ordered set of safety, target, and comfort requirements and introduce an automated methodology to enforce a natural order among requirements into the reward signal. We perform this by automatically translating the requirements into a sum of safety, target, and comfort rewards, where the target reward is a function of the safety reward and the comfort reward is a function of the safety and target rewards. Using a potential-based formulation, we enhance sparse to dense rewards and formally prove this to maintain policy optimality. We call our novel approach hierarchical, potential-based reward shaping (HPRS). Our experiments on eight robotics benchmarks demonstrate that HPRS is able to generate policies satisfying complex hierarchical requirements. Moreover, compared with the state of the art, HPRS achieves faster convergence and superior performance with respect to the rank-preserving policy-assessment metric. By automatically balancing competing requirements, HPRS produces task-satisfying policies with improved comfort and without manual parameter tuning. Through ablation studies, we analyze the impact of individual requirement classes on emergent behavior. Our experiments show that HPRS benefits from comfort requirements when aligned with the target and safety and ignores them when in conflict with the safety or target requirements. Finally, we validate the practical usability of HPRS in real-world robotics applications, including two sim-to-real experiments using F1TENTH vehicles. These experiments show that a hierarchical design of task specifications facilitates the sim-to-real transfer without any domain adaptation.https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188/fullroboticsrobot learningreinforcement learningreward shapingformal specifications |
spellingShingle | Luigi Berducci Edgar A. Aguilar Dejan Ničković Radu Grosu HPRS: hierarchical potential-based reward shaping from task specifications Frontiers in Robotics and AI robotics robot learning reinforcement learning reward shaping formal specifications |
title | HPRS: hierarchical potential-based reward shaping from task specifications |
title_full | HPRS: hierarchical potential-based reward shaping from task specifications |
title_fullStr | HPRS: hierarchical potential-based reward shaping from task specifications |
title_full_unstemmed | HPRS: hierarchical potential-based reward shaping from task specifications |
title_short | HPRS: hierarchical potential-based reward shaping from task specifications |
title_sort | hprs hierarchical potential based reward shaping from task specifications |
topic | robotics robot learning reinforcement learning reward shaping formal specifications |
url | https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188/full |
work_keys_str_mv | AT luigiberducci hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications AT edgaraaguilar hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications AT dejannickovic hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications AT radugrosu hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications |