LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process Models
Contextual multi-armed bandits (CMABs) are vital for sequential decision-making in areas such as recommendation systems, clinical trials, and finance. We propose a simulation framework integrating Gaussian Process (GP)-based CMABs with vine copulas to model dependent contexts and GARCH processes to...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-08-01
|
| Series: | Mathematics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7390/13/15/2523 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849406006204825600 |
|---|---|
| author | Jong-Min Kim |
| author_facet | Jong-Min Kim |
| author_sort | Jong-Min Kim |
| collection | DOAJ |
| description | Contextual multi-armed bandits (CMABs) are vital for sequential decision-making in areas such as recommendation systems, clinical trials, and finance. We propose a simulation framework integrating Gaussian Process (GP)-based CMABs with vine copulas to model dependent contexts and GARCH processes to capture reward volatility. Rewards are generated via copula-transformed Beta distributions to reflect complex joint dependencies and skewness. We evaluate four policies—ensemble, Epsilon-greedy, Thompson, and Upper Confidence Bound (UCB)—over 10,000 replications, assessing cumulative regret, observed reward, and cumulative reward. While Thompson sampling and LLM-guided policies consistently minimize regret and maximize rewards under varied reward distributions, Epsilon-greedy shows instability, and UCB exhibits moderate performance. Enhancing the ensemble with copula features, GP models, and dynamic policy selection driven by a large language model (LLM) yields superior adaptability and performance. Our results highlight the effectiveness of combining structured probabilistic models with LLM-based guidance for robust, adaptive decision-making in skewed, high-variance environments. |
| format | Article |
| id | doaj-art-ebee6eb503134bfea6840290ff03f4f6 |
| institution | Kabale University |
| issn | 2227-7390 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Mathematics |
| spelling | doaj-art-ebee6eb503134bfea6840290ff03f4f62025-08-20T03:36:31ZengMDPI AGMathematics2227-73902025-08-011315252310.3390/math13152523LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process ModelsJong-Min Kim0Statistics Discipline, Division of Science and Mathematics, University of Minnesota-Morris, Morris, MN 56267, USAContextual multi-armed bandits (CMABs) are vital for sequential decision-making in areas such as recommendation systems, clinical trials, and finance. We propose a simulation framework integrating Gaussian Process (GP)-based CMABs with vine copulas to model dependent contexts and GARCH processes to capture reward volatility. Rewards are generated via copula-transformed Beta distributions to reflect complex joint dependencies and skewness. We evaluate four policies—ensemble, Epsilon-greedy, Thompson, and Upper Confidence Bound (UCB)—over 10,000 replications, assessing cumulative regret, observed reward, and cumulative reward. While Thompson sampling and LLM-guided policies consistently minimize regret and maximize rewards under varied reward distributions, Epsilon-greedy shows instability, and UCB exhibits moderate performance. Enhancing the ensemble with copula features, GP models, and dynamic policy selection driven by a large language model (LLM) yields superior adaptability and performance. Our results highlight the effectiveness of combining structured probabilistic models with LLM-based guidance for robust, adaptive decision-making in skewed, high-variance environments.https://www.mdpi.com/2227-7390/13/15/2523contextual banditsGaussian processeslarge language modelsfunctional GARCHvine copulasadaptive policy |
| spellingShingle | Jong-Min Kim LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process Models Mathematics contextual bandits Gaussian processes large language models functional GARCH vine copulas adaptive policy |
| title | LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process Models |
| title_full | LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process Models |
| title_fullStr | LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process Models |
| title_full_unstemmed | LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process Models |
| title_short | LLM-Guided Ensemble Learning for Contextual Bandits with Copula and Gaussian Process Models |
| title_sort | llm guided ensemble learning for contextual bandits with copula and gaussian process models |
| topic | contextual bandits Gaussian processes large language models functional GARCH vine copulas adaptive policy |
| url | https://www.mdpi.com/2227-7390/13/15/2523 |
| work_keys_str_mv | AT jongminkim llmguidedensemblelearningforcontextualbanditswithcopulaandgaussianprocessmodels |