HPRS: hierarchical potential-based reward shaping from task specifications

The automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Seve...

Full description

Saved in:
Bibliographic Details
Main Authors: Luigi Berducci, Edgar A. Aguilar, Dejan Ničković, Radu Grosu
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-02-01
Series:Frontiers in Robotics and AI
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823860916298448896
author Luigi Berducci
Edgar A. Aguilar
Dejan Ničković
Radu Grosu
author_facet Luigi Berducci
Edgar A. Aguilar
Dejan Ničković
Radu Grosu
author_sort Luigi Berducci
collection DOAJ
description The automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Several works have been developing automated reward definitions from formal requirements, but they show limitations in producing a signal which is both effective in training and able to fulfill multiple heterogeneous requirements. In this paper, we define a task as a partially ordered set of safety, target, and comfort requirements and introduce an automated methodology to enforce a natural order among requirements into the reward signal. We perform this by automatically translating the requirements into a sum of safety, target, and comfort rewards, where the target reward is a function of the safety reward and the comfort reward is a function of the safety and target rewards. Using a potential-based formulation, we enhance sparse to dense rewards and formally prove this to maintain policy optimality. We call our novel approach hierarchical, potential-based reward shaping (HPRS). Our experiments on eight robotics benchmarks demonstrate that HPRS is able to generate policies satisfying complex hierarchical requirements. Moreover, compared with the state of the art, HPRS achieves faster convergence and superior performance with respect to the rank-preserving policy-assessment metric. By automatically balancing competing requirements, HPRS produces task-satisfying policies with improved comfort and without manual parameter tuning. Through ablation studies, we analyze the impact of individual requirement classes on emergent behavior. Our experiments show that HPRS benefits from comfort requirements when aligned with the target and safety and ignores them when in conflict with the safety or target requirements. Finally, we validate the practical usability of HPRS in real-world robotics applications, including two sim-to-real experiments using F1TENTH vehicles. These experiments show that a hierarchical design of task specifications facilitates the sim-to-real transfer without any domain adaptation.
format Article
id doaj-art-a8fdabe093aa4579aad3391aeb3c7179
institution Kabale University
issn 2296-9144
language English
publishDate 2025-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Robotics and AI
spelling doaj-art-a8fdabe093aa4579aad3391aeb3c71792025-02-10T08:58:21ZengFrontiers Media S.A.Frontiers in Robotics and AI2296-91442025-02-011110.3389/frobt.2024.14441881444188HPRS: hierarchical potential-based reward shaping from task specificationsLuigi Berducci0Edgar A. Aguilar1Dejan Ničković2Radu Grosu3Cyber-Physical Systems Group, Computer Engineering, TU Wien, Vienna, AustriaCenter for Digital Safety and Security, AIT Austrian Institute of Technology GmbH, Vienna, AustriaCenter for Digital Safety and Security, AIT Austrian Institute of Technology GmbH, Vienna, AustriaCyber-Physical Systems Group, Computer Engineering, TU Wien, Vienna, AustriaThe automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Several works have been developing automated reward definitions from formal requirements, but they show limitations in producing a signal which is both effective in training and able to fulfill multiple heterogeneous requirements. In this paper, we define a task as a partially ordered set of safety, target, and comfort requirements and introduce an automated methodology to enforce a natural order among requirements into the reward signal. We perform this by automatically translating the requirements into a sum of safety, target, and comfort rewards, where the target reward is a function of the safety reward and the comfort reward is a function of the safety and target rewards. Using a potential-based formulation, we enhance sparse to dense rewards and formally prove this to maintain policy optimality. We call our novel approach hierarchical, potential-based reward shaping (HPRS). Our experiments on eight robotics benchmarks demonstrate that HPRS is able to generate policies satisfying complex hierarchical requirements. Moreover, compared with the state of the art, HPRS achieves faster convergence and superior performance with respect to the rank-preserving policy-assessment metric. By automatically balancing competing requirements, HPRS produces task-satisfying policies with improved comfort and without manual parameter tuning. Through ablation studies, we analyze the impact of individual requirement classes on emergent behavior. Our experiments show that HPRS benefits from comfort requirements when aligned with the target and safety and ignores them when in conflict with the safety or target requirements. Finally, we validate the practical usability of HPRS in real-world robotics applications, including two sim-to-real experiments using F1TENTH vehicles. These experiments show that a hierarchical design of task specifications facilitates the sim-to-real transfer without any domain adaptation.https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188/fullroboticsrobot learningreinforcement learningreward shapingformal specifications
spellingShingle Luigi Berducci
Edgar A. Aguilar
Dejan Ničković
Radu Grosu
HPRS: hierarchical potential-based reward shaping from task specifications
Frontiers in Robotics and AI
robotics
robot learning
reinforcement learning
reward shaping
formal specifications
title HPRS: hierarchical potential-based reward shaping from task specifications
title_full HPRS: hierarchical potential-based reward shaping from task specifications
title_fullStr HPRS: hierarchical potential-based reward shaping from task specifications
title_full_unstemmed HPRS: hierarchical potential-based reward shaping from task specifications
title_short HPRS: hierarchical potential-based reward shaping from task specifications
title_sort hprs hierarchical potential based reward shaping from task specifications
topic robotics
robot learning
reinforcement learning
reward shaping
formal specifications
url https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188/full
work_keys_str_mv AT luigiberducci hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications
AT edgaraaguilar hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications
AT dejannickovic hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications
AT radugrosu hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications