Selective Reviews of Bandit Problems in AI via a Statistical View
Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Mathematics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7390/13/4/665 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849719411055788032 |
|---|---|
| author | Pengjie Zhou Haoyu Wei Huiming Zhang |
| author_facet | Pengjie Zhou Haoyu Wei Huiming Zhang |
| author_sort | Pengjie Zhou |
| collection | DOAJ |
| description | Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration–exploitation trade-offs. Additionally, we explore <i>K</i>-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field. |
| format | Article |
| id | doaj-art-47239c6360da4deca01a4f2f244960fe |
| institution | DOAJ |
| issn | 2227-7390 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Mathematics |
| spelling | doaj-art-47239c6360da4deca01a4f2f244960fe2025-08-20T03:12:09ZengMDPI AGMathematics2227-73902025-02-0113466510.3390/math13040665Selective Reviews of Bandit Problems in AI via a Statistical ViewPengjie Zhou0Haoyu Wei1Huiming Zhang2Institute of Artificial Intelligence, Beihang University, Beijing 100191, ChinaDepartment of Economics, University of California San Diego, La Jolla, CA 92093, USAInstitute of Artificial Intelligence, Beihang University, Beijing 100191, ChinaReinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration–exploitation trade-offs. Additionally, we explore <i>K</i>-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.https://www.mdpi.com/2227-7390/13/4/665bandit problemsexploration–exploitationconcentration inequalitiessub-Gaussian parameter estimationminimax ratefunctional data analysis |
| spellingShingle | Pengjie Zhou Haoyu Wei Huiming Zhang Selective Reviews of Bandit Problems in AI via a Statistical View Mathematics bandit problems exploration–exploitation concentration inequalities sub-Gaussian parameter estimation minimax rate functional data analysis |
| title | Selective Reviews of Bandit Problems in AI via a Statistical View |
| title_full | Selective Reviews of Bandit Problems in AI via a Statistical View |
| title_fullStr | Selective Reviews of Bandit Problems in AI via a Statistical View |
| title_full_unstemmed | Selective Reviews of Bandit Problems in AI via a Statistical View |
| title_short | Selective Reviews of Bandit Problems in AI via a Statistical View |
| title_sort | selective reviews of bandit problems in ai via a statistical view |
| topic | bandit problems exploration–exploitation concentration inequalities sub-Gaussian parameter estimation minimax rate functional data analysis |
| url | https://www.mdpi.com/2227-7390/13/4/665 |
| work_keys_str_mv | AT pengjiezhou selectivereviewsofbanditproblemsinaiviaastatisticalview AT haoyuwei selectivereviewsofbanditproblemsinaiviaastatisticalview AT huimingzhang selectivereviewsofbanditproblemsinaiviaastatisticalview |