Adaptive Noise Exploration for Neural Contextual Multi-Armed Bandits

In contextual multi-armed bandits, the relationship between contextual information and rewards is typically unknown, complicating the trade-off between exploration and exploitation. A common approach to address this challenge is the Upper Confidence Bound (UCB) method, which constructs confidence in...

Full description

Saved in:
Bibliographic Details
Main Authors: Chi Wang, Lin Shi, Junru Luo
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/18/2/56
Tags: Add Tag
No Tags, Be the first to tag this record!