Online Attentive Kernel-Based Off-Policy Temporal Difference Learning
Temporal difference (TD) learning is a powerful framework for value function approximation in reinforcement learning. However, standard TD methods often struggle with feature representation and off-policy learning challenges. In this paper, we propose a novel framework, online attentive kernel-based...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-11-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/23/11114 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|