Online Attentive Kernel-Based Off-Policy Temporal Difference Learning

Temporal difference (TD) learning is a powerful framework for value function approximation in reinforcement learning. However, standard TD methods often struggle with feature representation and off-policy learning challenges. In this paper, we propose a novel framework, online attentive kernel-based...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shangdong Yang, Shuaiqiang Zhang, Xingguo Chen
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Applied Sciences
Subjects:	online attentive learning kernel-based methods reinforcement learning off-policy temporal difference learning two-timescale analysis
Online Access:	https://www.mdpi.com/2076-3417/14/23/11114
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://www.mdpi.com/2076-3417/14/23/11114

Online Attentive Kernel-Based Off-Policy Temporal Difference Learning

Internet

Similar Items