Online Attentive Kernel-Based Off-Policy Temporal Difference Learning

Temporal difference (TD) learning is a powerful framework for value function approximation in reinforcement learning. However, standard TD methods often struggle with feature representation and off-policy learning challenges. In this paper, we propose a novel framework, online attentive kernel-based...

Full description

Saved in:
Bibliographic Details
Main Authors: Shangdong Yang, Shuaiqiang Zhang, Xingguo Chen
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/23/11114
Tags: Add Tag
No Tags, Be the first to tag this record!

Similar Items