Heuristic Sarsa algorithm based on value function transfer

With the problem of slow convergence for traditional Sarsa algorithm,an improved heuristic Sarsa algorithm based on value function transfer was proposed.The algorithm combined traditional Sarsa algorithm and value function transfer method,and the algorithm introduced bisimulation metric and used it...

Full description

Saved in:
Bibliographic Details
Main Authors: Jianping CHEN, Zhengxia YANG, Quan LIU, Hongjie WU, Yang XU, Qiming FU
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2018-08-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018133/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841539417964019712
author Jianping CHEN
Zhengxia YANG
Quan LIU
Hongjie WU
Yang XU
Qiming FU
author_facet Jianping CHEN
Zhengxia YANG
Quan LIU
Hongjie WU
Yang XU
Qiming FU
author_sort Jianping CHEN
collection DOAJ
description With the problem of slow convergence for traditional Sarsa algorithm,an improved heuristic Sarsa algorithm based on value function transfer was proposed.The algorithm combined traditional Sarsa algorithm and value function transfer method,and the algorithm introduced bisimulation metric and used it to measure the similarity between new tasks and historical tasks in which those two tasks had the same state space and action space and speed up the algorithm convergence.In addition,combined with heuristic exploration method,the algorithm introduced Bayesian inference and used variational inference to measure information gain.Finally,using the obtained information gain to build intrinsic reward function model as exploring factors,to speed up the convergence of the algorithm.Applying the proposed algorithm to the traditional Grid World problem,and compared with the traditional Sarsa algorithm,the Q-Learning algorithm,and the VFT-Sarsa algorithm,the IGP-Sarsa algorithm with better convergence performance,the experiment results show that the proposed algorithm has faster convergence speed and better convergence stability.
format Article
id doaj-art-49b33baa4cd84d8a8dd4a16186075198
institution Kabale University
issn 1000-436X
language zho
publishDate 2018-08-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-49b33baa4cd84d8a8dd4a161860751982025-01-14T07:15:14ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2018-08-0139374759719796Heuristic Sarsa algorithm based on value function transferJianping CHENZhengxia YANGQuan LIUHongjie WUYang XUQiming FUWith the problem of slow convergence for traditional Sarsa algorithm,an improved heuristic Sarsa algorithm based on value function transfer was proposed.The algorithm combined traditional Sarsa algorithm and value function transfer method,and the algorithm introduced bisimulation metric and used it to measure the similarity between new tasks and historical tasks in which those two tasks had the same state space and action space and speed up the algorithm convergence.In addition,combined with heuristic exploration method,the algorithm introduced Bayesian inference and used variational inference to measure information gain.Finally,using the obtained information gain to build intrinsic reward function model as exploring factors,to speed up the convergence of the algorithm.Applying the proposed algorithm to the traditional Grid World problem,and compared with the traditional Sarsa algorithm,the Q-Learning algorithm,and the VFT-Sarsa algorithm,the IGP-Sarsa algorithm with better convergence performance,the experiment results show that the proposed algorithm has faster convergence speed and better convergence stability.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018133/reinforcement learningvalue function transferbisimulation metricvariational Bayes
spellingShingle Jianping CHEN
Zhengxia YANG
Quan LIU
Hongjie WU
Yang XU
Qiming FU
Heuristic Sarsa algorithm based on value function transfer
Tongxin xuebao
reinforcement learning
value function transfer
bisimulation metric
variational Bayes
title Heuristic Sarsa algorithm based on value function transfer
title_full Heuristic Sarsa algorithm based on value function transfer
title_fullStr Heuristic Sarsa algorithm based on value function transfer
title_full_unstemmed Heuristic Sarsa algorithm based on value function transfer
title_short Heuristic Sarsa algorithm based on value function transfer
title_sort heuristic sarsa algorithm based on value function transfer
topic reinforcement learning
value function transfer
bisimulation metric
variational Bayes
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018133/
work_keys_str_mv AT jianpingchen heuristicsarsaalgorithmbasedonvaluefunctiontransfer
AT zhengxiayang heuristicsarsaalgorithmbasedonvaluefunctiontransfer
AT quanliu heuristicsarsaalgorithmbasedonvaluefunctiontransfer
AT hongjiewu heuristicsarsaalgorithmbasedonvaluefunctiontransfer
AT yangxu heuristicsarsaalgorithmbasedonvaluefunctiontransfer
AT qimingfu heuristicsarsaalgorithmbasedonvaluefunctiontransfer