HPRS: hierarchical potential-based reward shaping from task specifications

The automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Seve...

Full description

Saved in:

Bibliographic Details
Main Authors:	Luigi Berducci, Edgar A. Aguilar, Dejan Ničković, Radu Grosu
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-02-01
Series:	Frontiers in Robotics and AI
Subjects:	robotics robot learning reinforcement learning reward shaping formal specifications
Online Access:	https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823860916298448896
author	Luigi Berducci Edgar A. Aguilar Dejan Ničković Radu Grosu
author_facet	Luigi Berducci Edgar A. Aguilar Dejan Ničković Radu Grosu
author_sort	Luigi Berducci
collection	DOAJ
description	The automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Several works have been developing automated reward definitions from formal requirements, but they show limitations in producing a signal which is both effective in training and able to fulfill multiple heterogeneous requirements. In this paper, we define a task as a partially ordered set of safety, target, and comfort requirements and introduce an automated methodology to enforce a natural order among requirements into the reward signal. We perform this by automatically translating the requirements into a sum of safety, target, and comfort rewards, where the target reward is a function of the safety reward and the comfort reward is a function of the safety and target rewards. Using a potential-based formulation, we enhance sparse to dense rewards and formally prove this to maintain policy optimality. We call our novel approach hierarchical, potential-based reward shaping (HPRS). Our experiments on eight robotics benchmarks demonstrate that HPRS is able to generate policies satisfying complex hierarchical requirements. Moreover, compared with the state of the art, HPRS achieves faster convergence and superior performance with respect to the rank-preserving policy-assessment metric. By automatically balancing competing requirements, HPRS produces task-satisfying policies with improved comfort and without manual parameter tuning. Through ablation studies, we analyze the impact of individual requirement classes on emergent behavior. Our experiments show that HPRS benefits from comfort requirements when aligned with the target and safety and ignores them when in conflict with the safety or target requirements. Finally, we validate the practical usability of HPRS in real-world robotics applications, including two sim-to-real experiments using F1TENTH vehicles. These experiments show that a hierarchical design of task specifications facilitates the sim-to-real transfer without any domain adaptation.
format	Article
id	doaj-art-a8fdabe093aa4579aad3391aeb3c7179
institution	Kabale University
issn	2296-9144
language	English
publishDate	2025-02-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Robotics and AI
spelling	doaj-art-a8fdabe093aa4579aad3391aeb3c71792025-02-10T08:58:21ZengFrontiers Media S.A.Frontiers in Robotics and AI2296-91442025-02-011110.3389/frobt.2024.14441881444188HPRS: hierarchical potential-based reward shaping from task specificationsLuigi Berducci0Edgar A. Aguilar1Dejan Ničković2Radu Grosu3Cyber-Physical Systems Group, Computer Engineering, TU Wien, Vienna, AustriaCenter for Digital Safety and Security, AIT Austrian Institute of Technology GmbH, Vienna, AustriaCenter for Digital Safety and Security, AIT Austrian Institute of Technology GmbH, Vienna, AustriaCyber-Physical Systems Group, Computer Engineering, TU Wien, Vienna, AustriaThe automatic synthesis of policies for robotics systems through reinforcement learning relies upon, and is intimately guided by, a reward signal. Consequently, this signal should faithfully reflect the designer’s intentions, which are often expressed as a collection of high-level requirements. Several works have been developing automated reward definitions from formal requirements, but they show limitations in producing a signal which is both effective in training and able to fulfill multiple heterogeneous requirements. In this paper, we define a task as a partially ordered set of safety, target, and comfort requirements and introduce an automated methodology to enforce a natural order among requirements into the reward signal. We perform this by automatically translating the requirements into a sum of safety, target, and comfort rewards, where the target reward is a function of the safety reward and the comfort reward is a function of the safety and target rewards. Using a potential-based formulation, we enhance sparse to dense rewards and formally prove this to maintain policy optimality. We call our novel approach hierarchical, potential-based reward shaping (HPRS). Our experiments on eight robotics benchmarks demonstrate that HPRS is able to generate policies satisfying complex hierarchical requirements. Moreover, compared with the state of the art, HPRS achieves faster convergence and superior performance with respect to the rank-preserving policy-assessment metric. By automatically balancing competing requirements, HPRS produces task-satisfying policies with improved comfort and without manual parameter tuning. Through ablation studies, we analyze the impact of individual requirement classes on emergent behavior. Our experiments show that HPRS benefits from comfort requirements when aligned with the target and safety and ignores them when in conflict with the safety or target requirements. Finally, we validate the practical usability of HPRS in real-world robotics applications, including two sim-to-real experiments using F1TENTH vehicles. These experiments show that a hierarchical design of task specifications facilitates the sim-to-real transfer without any domain adaptation.https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188/fullroboticsrobot learningreinforcement learningreward shapingformal specifications
spellingShingle	Luigi Berducci Edgar A. Aguilar Dejan Ničković Radu Grosu HPRS: hierarchical potential-based reward shaping from task specifications Frontiers in Robotics and AI robotics robot learning reinforcement learning reward shaping formal specifications
title	HPRS: hierarchical potential-based reward shaping from task specifications
title_full	HPRS: hierarchical potential-based reward shaping from task specifications
title_fullStr	HPRS: hierarchical potential-based reward shaping from task specifications
title_full_unstemmed	HPRS: hierarchical potential-based reward shaping from task specifications
title_short	HPRS: hierarchical potential-based reward shaping from task specifications
title_sort	hprs hierarchical potential based reward shaping from task specifications
topic	robotics robot learning reinforcement learning reward shaping formal specifications
url	https://www.frontiersin.org/articles/10.3389/frobt.2024.1444188/full
work_keys_str_mv	AT luigiberducci hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications AT edgaraaguilar hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications AT dejannickovic hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications AT radugrosu hprshierarchicalpotentialbasedrewardshapingfromtaskspecifications

HPRS: hierarchical potential-based reward shaping from task specifications

Similar Items