Enhanced deep deterministic policy gradient algorithm
With the problem of slow convergence for deep deterministic policy gradient algorithm,an enhanced deep deterministic policy gradient algorithm was proposed.Based on the deep deterministic policy gradient algorithm,two sample pools were constructed,and the time difference error was introduced.The pri...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2018-11-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018238/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841539456802226176 |
---|---|
author | Jianping CHEN Chao HE Quan LIU Hongjie WU Fuyuan HU Qiming FU |
author_facet | Jianping CHEN Chao HE Quan LIU Hongjie WU Fuyuan HU Qiming FU |
author_sort | Jianping CHEN |
collection | DOAJ |
description | With the problem of slow convergence for deep deterministic policy gradient algorithm,an enhanced deep deterministic policy gradient algorithm was proposed.Based on the deep deterministic policy gradient algorithm,two sample pools were constructed,and the time difference error was introduced.The priority samples were added when the experience was played back.When the samples were trained,the samples were selected from two sample pools respectively.At the same time,the bisimulation metric was introduced to ensure the diversity of the selected samples and improve the convergence rate of the algorithm.The E-DDPG algorithm was used to pendulum problem.The experimental results show that the E-DDPG algorithm can effectively improve the convergence performance of the continuous action space problems and have better stability. |
format | Article |
id | doaj-art-c53ac5d18e434ccf89322f5661de1616 |
institution | Kabale University |
issn | 1000-436X |
language | zho |
publishDate | 2018-11-01 |
publisher | Editorial Department of Journal on Communications |
record_format | Article |
series | Tongxin xuebao |
spelling | doaj-art-c53ac5d18e434ccf89322f5661de16162025-01-14T07:15:47ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2018-11-013910611559721791Enhanced deep deterministic policy gradient algorithmJianping CHENChao HEQuan LIUHongjie WUFuyuan HUQiming FUWith the problem of slow convergence for deep deterministic policy gradient algorithm,an enhanced deep deterministic policy gradient algorithm was proposed.Based on the deep deterministic policy gradient algorithm,two sample pools were constructed,and the time difference error was introduced.The priority samples were added when the experience was played back.When the samples were trained,the samples were selected from two sample pools respectively.At the same time,the bisimulation metric was introduced to ensure the diversity of the selected samples and improve the convergence rate of the algorithm.The E-DDPG algorithm was used to pendulum problem.The experimental results show that the E-DDPG algorithm can effectively improve the convergence performance of the continuous action space problems and have better stability.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018238/deep reinforcement learningsample rankingbisimulation metrictemporal difference error |
spellingShingle | Jianping CHEN Chao HE Quan LIU Hongjie WU Fuyuan HU Qiming FU Enhanced deep deterministic policy gradient algorithm Tongxin xuebao deep reinforcement learning sample ranking bisimulation metric temporal difference error |
title | Enhanced deep deterministic policy gradient algorithm |
title_full | Enhanced deep deterministic policy gradient algorithm |
title_fullStr | Enhanced deep deterministic policy gradient algorithm |
title_full_unstemmed | Enhanced deep deterministic policy gradient algorithm |
title_short | Enhanced deep deterministic policy gradient algorithm |
title_sort | enhanced deep deterministic policy gradient algorithm |
topic | deep reinforcement learning sample ranking bisimulation metric temporal difference error |
url | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2018238/ |
work_keys_str_mv | AT jianpingchen enhanceddeepdeterministicpolicygradientalgorithm AT chaohe enhanceddeepdeterministicpolicygradientalgorithm AT quanliu enhanceddeepdeterministicpolicygradientalgorithm AT hongjiewu enhanceddeepdeterministicpolicygradientalgorithm AT fuyuanhu enhanceddeepdeterministicpolicygradientalgorithm AT qimingfu enhanceddeepdeterministicpolicygradientalgorithm |