Selective Reviews of Bandit Problems in AI via a Statistical View

Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential...

Full description

Saved in:
Bibliographic Details
Main Authors: Pengjie Zhou, Haoyu Wei, Huiming Zhang
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/4/665
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849719411055788032
author Pengjie Zhou
Haoyu Wei
Huiming Zhang
author_facet Pengjie Zhou
Haoyu Wei
Huiming Zhang
author_sort Pengjie Zhou
collection DOAJ
description Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration–exploitation trade-offs. Additionally, we explore <i>K</i>-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.
format Article
id doaj-art-47239c6360da4deca01a4f2f244960fe
institution DOAJ
issn 2227-7390
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-47239c6360da4deca01a4f2f244960fe2025-08-20T03:12:09ZengMDPI AGMathematics2227-73902025-02-0113466510.3390/math13040665Selective Reviews of Bandit Problems in AI via a Statistical ViewPengjie Zhou0Haoyu Wei1Huiming Zhang2Institute of Artificial Intelligence, Beihang University, Beijing 100191, ChinaDepartment of Economics, University of California San Diego, La Jolla, CA 92093, USAInstitute of Artificial Intelligence, Beihang University, Beijing 100191, ChinaReinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration–exploitation trade-offs. Additionally, we explore <i>K</i>-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.https://www.mdpi.com/2227-7390/13/4/665bandit problemsexploration–exploitationconcentration inequalitiessub-Gaussian parameter estimationminimax ratefunctional data analysis
spellingShingle Pengjie Zhou
Haoyu Wei
Huiming Zhang
Selective Reviews of Bandit Problems in AI via a Statistical View
Mathematics
bandit problems
exploration–exploitation
concentration inequalities
sub-Gaussian parameter estimation
minimax rate
functional data analysis
title Selective Reviews of Bandit Problems in AI via a Statistical View
title_full Selective Reviews of Bandit Problems in AI via a Statistical View
title_fullStr Selective Reviews of Bandit Problems in AI via a Statistical View
title_full_unstemmed Selective Reviews of Bandit Problems in AI via a Statistical View
title_short Selective Reviews of Bandit Problems in AI via a Statistical View
title_sort selective reviews of bandit problems in ai via a statistical view
topic bandit problems
exploration–exploitation
concentration inequalities
sub-Gaussian parameter estimation
minimax rate
functional data analysis
url https://www.mdpi.com/2227-7390/13/4/665
work_keys_str_mv AT pengjiezhou selectivereviewsofbanditproblemsinaiviaastatisticalview
AT haoyuwei selectivereviewsofbanditproblemsinaiviaastatisticalview
AT huimingzhang selectivereviewsofbanditproblemsinaiviaastatisticalview