Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
Abstract Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to t...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2025-01-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-024-01757-w |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to their high sensitivity to stepsize choices. In order to mitigate this issue, we propose an adaptive neural TD algorithm (AdaBNTD) inspired by the superior performance of adaptive gradient techniques in training deep neural networks. Simultaneously, we derive non-asymptotic bounds for AdaBNTD within the Markovian observation framework. In particular, AdaBNTD is capable of converging to the global optimum of the mean square projection Bellman error (MSPBE) with a convergence rate of $${{\mathcal {O}}}(1/\sqrt{K})$$ O ( 1 / K ) , where K denotes the iteration count. Besides, the effectiveness AdaBNTD is also verified through several reinforcement learning benchmark domains. |
---|---|
ISSN: | 2199-4536 2198-6053 |