LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process Models

Contextual multi-armed bandits (CMABs) are vital for sequential decision-making in areas such as recommendation systems, clinical trials, and finance. We propose a simulation framework integrating Gaussian Process (GP)-based CMABs with vine copulas to model dependent contexts and GARCH processes to...

Full description

Saved in:
Bibliographic Details
Main Author: Jong-Min Kim
Format: Article
Language:English
Published: MDPI AG 2025-08-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/15/2523
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Contextual multi-armed bandits (CMABs) are vital for sequential decision-making in areas such as recommendation systems, clinical trials, and finance. We propose a simulation framework integrating Gaussian Process (GP)-based CMABs with vine copulas to model dependent contexts and GARCH processes to capture reward volatility. Rewards are generated via copula-transformed Beta distributions to reflect complex joint dependencies and skewness. We evaluate four policies—ensemble, Epsilon-greedy, Thompson, and Upper Confidence Bound (UCB)—over 10,000 replications, assessing cumulative regret, observed reward, and cumulative reward. While Thompson sampling and LLM-guided policies consistently minimize regret and maximize rewards under varied reward distributions, Epsilon-greedy shows instability, and UCB exhibits moderate performance. Enhancing the ensemble with copula features, GP models, and dynamic policy selection driven by a large language model (LLM) yields superior adaptability and performance. Our results highlight the effectiveness of combining structured probabilistic models with LLM-based guidance for robust, adaptive decision-making in skewed, high-variance environments.
ISSN:2227-7390