Heuristic Sarsa algorithm based on value function transfer
With the problem of slow convergence for traditional Sarsa algorithm,an improved heuristic Sarsa algorithm based on value function transfer was proposed.The algorithm combined traditional Sarsa algorithm and value function transfer method,and the algorithm introduced bisimulation metric and used it...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2018-08-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018133/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841539417964019712 |
---|---|
author | Jianping CHEN Zhengxia YANG Quan LIU Hongjie WU Yang XU Qiming FU |
author_facet | Jianping CHEN Zhengxia YANG Quan LIU Hongjie WU Yang XU Qiming FU |
author_sort | Jianping CHEN |
collection | DOAJ |
description | With the problem of slow convergence for traditional Sarsa algorithm,an improved heuristic Sarsa algorithm based on value function transfer was proposed.The algorithm combined traditional Sarsa algorithm and value function transfer method,and the algorithm introduced bisimulation metric and used it to measure the similarity between new tasks and historical tasks in which those two tasks had the same state space and action space and speed up the algorithm convergence.In addition,combined with heuristic exploration method,the algorithm introduced Bayesian inference and used variational inference to measure information gain.Finally,using the obtained information gain to build intrinsic reward function model as exploring factors,to speed up the convergence of the algorithm.Applying the proposed algorithm to the traditional Grid World problem,and compared with the traditional Sarsa algorithm,the Q-Learning algorithm,and the VFT-Sarsa algorithm,the IGP-Sarsa algorithm with better convergence performance,the experiment results show that the proposed algorithm has faster convergence speed and better convergence stability. |
format | Article |
id | doaj-art-49b33baa4cd84d8a8dd4a16186075198 |
institution | Kabale University |
issn | 1000-436X |
language | zho |
publishDate | 2018-08-01 |
publisher | Editorial Department of Journal on Communications |
record_format | Article |
series | Tongxin xuebao |
spelling | doaj-art-49b33baa4cd84d8a8dd4a161860751982025-01-14T07:15:14ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2018-08-0139374759719796Heuristic Sarsa algorithm based on value function transferJianping CHENZhengxia YANGQuan LIUHongjie WUYang XUQiming FUWith the problem of slow convergence for traditional Sarsa algorithm,an improved heuristic Sarsa algorithm based on value function transfer was proposed.The algorithm combined traditional Sarsa algorithm and value function transfer method,and the algorithm introduced bisimulation metric and used it to measure the similarity between new tasks and historical tasks in which those two tasks had the same state space and action space and speed up the algorithm convergence.In addition,combined with heuristic exploration method,the algorithm introduced Bayesian inference and used variational inference to measure information gain.Finally,using the obtained information gain to build intrinsic reward function model as exploring factors,to speed up the convergence of the algorithm.Applying the proposed algorithm to the traditional Grid World problem,and compared with the traditional Sarsa algorithm,the Q-Learning algorithm,and the VFT-Sarsa algorithm,the IGP-Sarsa algorithm with better convergence performance,the experiment results show that the proposed algorithm has faster convergence speed and better convergence stability.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018133/reinforcement learningvalue function transferbisimulation metricvariational Bayes |
spellingShingle | Jianping CHEN Zhengxia YANG Quan LIU Hongjie WU Yang XU Qiming FU Heuristic Sarsa algorithm based on value function transfer Tongxin xuebao reinforcement learning value function transfer bisimulation metric variational Bayes |
title | Heuristic Sarsa algorithm based on value function transfer |
title_full | Heuristic Sarsa algorithm based on value function transfer |
title_fullStr | Heuristic Sarsa algorithm based on value function transfer |
title_full_unstemmed | Heuristic Sarsa algorithm based on value function transfer |
title_short | Heuristic Sarsa algorithm based on value function transfer |
title_sort | heuristic sarsa algorithm based on value function transfer |
topic | reinforcement learning value function transfer bisimulation metric variational Bayes |
url | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018133/ |
work_keys_str_mv | AT jianpingchen heuristicsarsaalgorithmbasedonvaluefunctiontransfer AT zhengxiayang heuristicsarsaalgorithmbasedonvaluefunctiontransfer AT quanliu heuristicsarsaalgorithmbasedonvaluefunctiontransfer AT hongjiewu heuristicsarsaalgorithmbasedonvaluefunctiontransfer AT yangxu heuristicsarsaalgorithmbasedonvaluefunctiontransfer AT qimingfu heuristicsarsaalgorithmbasedonvaluefunctiontransfer |