Demonstration and offset augmented meta reinforcement learning with sparse rewards

Abstract This paper introduces DOAMRL, a novel meta-reinforcement learning (meta-RL) method that extends the Model-Agnostic Meta-Learning (MAML) framework. The method addresses a key limitation of existing meta-RL approaches, which struggle to effectively use suboptimal demonstrations to guide train...

Full description

Saved in:
Bibliographic Details
Main Authors: Haorui Li, Jiaqi Liang, Xiaoxuan Wang, Chengzhi Jiang, Linjing Li, Daniel Zeng
Format: Article
Language:English
Published: Springer 2025-02-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-025-01785-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849731350963159040
author Haorui Li
Jiaqi Liang
Xiaoxuan Wang
Chengzhi Jiang
Linjing Li
Daniel Zeng
author_facet Haorui Li
Jiaqi Liang
Xiaoxuan Wang
Chengzhi Jiang
Linjing Li
Daniel Zeng
author_sort Haorui Li
collection DOAJ
description Abstract This paper introduces DOAMRL, a novel meta-reinforcement learning (meta-RL) method that extends the Model-Agnostic Meta-Learning (MAML) framework. The method addresses a key limitation of existing meta-RL approaches, which struggle to effectively use suboptimal demonstrations to guide training in sparse reward environments. DOAMRL effectively combines reinforcement learning (RL) and imitation learning (IL) within the inner loop of the MAML framework, with dynamically adjusted weights applied to the IL component. This enables the method to leverage the exploration strengths of RL and the efficiency benefits of IL at different stages of training. Additionally, DOAMRL introduces a meta-learned parameter offset, which enhances targeted exploration in sparse reward settings, helping to guide the meta-policy toward regions with non-zero rewards. To further mitigate the impact of suboptimal demonstration data on meta-training, we propose a novel demonstration data enhancement module that iteratively improves the quality of the demonstrations. We provide a comprehensive analysis of the proposed method, justifying its design choices. A comprehensive comparison with existing methods in various stages (including training and adaptation), using both optimal and suboptimal demonstrations, along with results from ablation and sensitivity analysis, demonstrates that DOAMRL outperforms existing approaches in performance, applicability, and robustness.
format Article
id doaj-art-edd807e971bf459fbc9c5f721e77cfa1
institution DOAJ
issn 2199-4536
2198-6053
language English
publishDate 2025-02-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-edd807e971bf459fbc9c5f721e77cfa12025-08-20T03:08:35ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-02-0111412010.1007/s40747-025-01785-0Demonstration and offset augmented meta reinforcement learning with sparse rewardsHaorui Li0Jiaqi Liang1Xiaoxuan Wang2Chengzhi Jiang3Linjing Li4Daniel Zeng5State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of SciencesState Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of SciencesState Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of SciencesState Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of SciencesState Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of SciencesState Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of SciencesAbstract This paper introduces DOAMRL, a novel meta-reinforcement learning (meta-RL) method that extends the Model-Agnostic Meta-Learning (MAML) framework. The method addresses a key limitation of existing meta-RL approaches, which struggle to effectively use suboptimal demonstrations to guide training in sparse reward environments. DOAMRL effectively combines reinforcement learning (RL) and imitation learning (IL) within the inner loop of the MAML framework, with dynamically adjusted weights applied to the IL component. This enables the method to leverage the exploration strengths of RL and the efficiency benefits of IL at different stages of training. Additionally, DOAMRL introduces a meta-learned parameter offset, which enhances targeted exploration in sparse reward settings, helping to guide the meta-policy toward regions with non-zero rewards. To further mitigate the impact of suboptimal demonstration data on meta-training, we propose a novel demonstration data enhancement module that iteratively improves the quality of the demonstrations. We provide a comprehensive analysis of the proposed method, justifying its design choices. A comprehensive comparison with existing methods in various stages (including training and adaptation), using both optimal and suboptimal demonstrations, along with results from ablation and sensitivity analysis, demonstrates that DOAMRL outperforms existing approaches in performance, applicability, and robustness.https://doi.org/10.1007/s40747-025-01785-0Meta learningReinforcement learningSparse rewardSuboptimal demonstrationOne-shot learningImitation learning
spellingShingle Haorui Li
Jiaqi Liang
Xiaoxuan Wang
Chengzhi Jiang
Linjing Li
Daniel Zeng
Demonstration and offset augmented meta reinforcement learning with sparse rewards
Complex & Intelligent Systems
Meta learning
Reinforcement learning
Sparse reward
Suboptimal demonstration
One-shot learning
Imitation learning
title Demonstration and offset augmented meta reinforcement learning with sparse rewards
title_full Demonstration and offset augmented meta reinforcement learning with sparse rewards
title_fullStr Demonstration and offset augmented meta reinforcement learning with sparse rewards
title_full_unstemmed Demonstration and offset augmented meta reinforcement learning with sparse rewards
title_short Demonstration and offset augmented meta reinforcement learning with sparse rewards
title_sort demonstration and offset augmented meta reinforcement learning with sparse rewards
topic Meta learning
Reinforcement learning
Sparse reward
Suboptimal demonstration
One-shot learning
Imitation learning
url https://doi.org/10.1007/s40747-025-01785-0
work_keys_str_mv AT haoruili demonstrationandoffsetaugmentedmetareinforcementlearningwithsparserewards
AT jiaqiliang demonstrationandoffsetaugmentedmetareinforcementlearningwithsparserewards
AT xiaoxuanwang demonstrationandoffsetaugmentedmetareinforcementlearningwithsparserewards
AT chengzhijiang demonstrationandoffsetaugmentedmetareinforcementlearningwithsparserewards
AT linjingli demonstrationandoffsetaugmentedmetareinforcementlearningwithsparserewards
AT danielzeng demonstrationandoffsetaugmentedmetareinforcementlearningwithsparserewards