Enhanced deep deterministic policy gradient algorithm

With the problem of slow convergence for deep deterministic policy gradient algorithm,an enhanced deep deterministic policy gradient algorithm was proposed.Based on the deep deterministic policy gradient algorithm,two sample pools were constructed,and the time difference error was introduced.The pri...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jianping CHEN, Chao HE, Quan LIU, Hongjie WU, Fuyuan HU, Qiming FU
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2018-11-01
Series:	Tongxin xuebao
Subjects:	deep reinforcement learning sample ranking bisimulation metric temporal difference error
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018238/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841539456802226176
author	Jianping CHEN Chao HE Quan LIU Hongjie WU Fuyuan HU Qiming FU
author_facet	Jianping CHEN Chao HE Quan LIU Hongjie WU Fuyuan HU Qiming FU
author_sort	Jianping CHEN
collection	DOAJ
description	With the problem of slow convergence for deep deterministic policy gradient algorithm,an enhanced deep deterministic policy gradient algorithm was proposed.Based on the deep deterministic policy gradient algorithm,two sample pools were constructed,and the time difference error was introduced.The priority samples were added when the experience was played back.When the samples were trained,the samples were selected from two sample pools respectively.At the same time,the bisimulation metric was introduced to ensure the diversity of the selected samples and improve the convergence rate of the algorithm.The E-DDPG algorithm was used to pendulum problem.The experimental results show that the E-DDPG algorithm can effectively improve the convergence performance of the continuous action space problems and have better stability.
format	Article
id	doaj-art-c53ac5d18e434ccf89322f5661de1616
institution	Kabale University
issn	1000-436X
language	zho
publishDate	2018-11-01
publisher	Editorial Department of Journal on Communications
record_format	Article
series	Tongxin xuebao
spelling	doaj-art-c53ac5d18e434ccf89322f5661de16162025-01-14T07:15:47ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2018-11-013910611559721791Enhanced deep deterministic policy gradient algorithmJianping CHENChao HEQuan LIUHongjie WUFuyuan HUQiming FUWith the problem of slow convergence for deep deterministic policy gradient algorithm,an enhanced deep deterministic policy gradient algorithm was proposed.Based on the deep deterministic policy gradient algorithm,two sample pools were constructed,and the time difference error was introduced.The priority samples were added when the experience was played back.When the samples were trained,the samples were selected from two sample pools respectively.At the same time,the bisimulation metric was introduced to ensure the diversity of the selected samples and improve the convergence rate of the algorithm.The E-DDPG algorithm was used to pendulum problem.The experimental results show that the E-DDPG algorithm can effectively improve the convergence performance of the continuous action space problems and have better stability.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018238/deep reinforcement learningsample rankingbisimulation metrictemporal difference error
spellingShingle	Jianping CHEN Chao HE Quan LIU Hongjie WU Fuyuan HU Qiming FU Enhanced deep deterministic policy gradient algorithm Tongxin xuebao deep reinforcement learning sample ranking bisimulation metric temporal difference error
title	Enhanced deep deterministic policy gradient algorithm
title_full	Enhanced deep deterministic policy gradient algorithm
title_fullStr	Enhanced deep deterministic policy gradient algorithm
title_full_unstemmed	Enhanced deep deterministic policy gradient algorithm
title_short	Enhanced deep deterministic policy gradient algorithm
title_sort	enhanced deep deterministic policy gradient algorithm
topic	deep reinforcement learning sample ranking bisimulation metric temporal difference error
url	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018238/
work_keys_str_mv	AT jianpingchen enhanceddeepdeterministicpolicygradientalgorithm AT chaohe enhanceddeepdeterministicpolicygradientalgorithm AT quanliu enhanceddeepdeterministicpolicygradientalgorithm AT hongjiewu enhanceddeepdeterministicpolicygradientalgorithm AT fuyuanhu enhanceddeepdeterministicpolicygradientalgorithm AT qimingfu enhanceddeepdeterministicpolicygradientalgorithm

Enhanced deep deterministic policy gradient algorithm

Similar Items