Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS

This research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB...

Full description

Saved in:
Bibliographic Details
Main Author: Lei Yicong
Format: Article
Language:English
Published: EDP Sciences 2025-01-01
Series:ITM Web of Conferences
Online Access:https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849705416962867200
author Lei Yicong
author_facet Lei Yicong
author_sort Lei Yicong
collection DOAJ
description This research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB), Asymptotically Optimal UCB, and Thompson Sampling Algorithms (TS) – are selected to solve the MAB problem with numerical and categorical types. Different types represent different value intervals. Each algorithm is applied to each dataset with two different horizons, which represent the number of iterations, to evaluate its short-term and long-term decision-making ability. All algorithms are then utilized in each dataset to compare which one is most suitable for solving a certain type of MAB problem. This research provides an explicit introduction to the MAB problem and the four algorithms. Furthermore, it concludes that both Asymptotically Optimal UCB and TS are suitable for decision-making in the short and long term. At the same time, Asymptotically Optimal UCB is the most appropriate for the numerical MAB problem, while TS is the most appropriate for the categorical MAB problem. Additionally, UCB only suits short-term decision-making, ETC can be efficient only in the numerical MAB problem.
format Article
id doaj-art-b47ae3d35e7341edbb325e77de2f4db9
institution DOAJ
issn 2271-2097
language English
publishDate 2025-01-01
publisher EDP Sciences
record_format Article
series ITM Web of Conferences
spelling doaj-art-b47ae3d35e7341edbb325e77de2f4db92025-08-20T03:16:28ZengEDP SciencesITM Web of Conferences2271-20972025-01-01730102610.1051/itmconf/20257301026itmconf_iwadi2024_01026Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TSLei Yicong0School of Science and School of Engineering, Hong Kong University of Science and TechnologyThis research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB), Asymptotically Optimal UCB, and Thompson Sampling Algorithms (TS) – are selected to solve the MAB problem with numerical and categorical types. Different types represent different value intervals. Each algorithm is applied to each dataset with two different horizons, which represent the number of iterations, to evaluate its short-term and long-term decision-making ability. All algorithms are then utilized in each dataset to compare which one is most suitable for solving a certain type of MAB problem. This research provides an explicit introduction to the MAB problem and the four algorithms. Furthermore, it concludes that both Asymptotically Optimal UCB and TS are suitable for decision-making in the short and long term. At the same time, Asymptotically Optimal UCB is the most appropriate for the numerical MAB problem, while TS is the most appropriate for the categorical MAB problem. Additionally, UCB only suits short-term decision-making, ETC can be efficient only in the numerical MAB problem.https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf
spellingShingle Lei Yicong
Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
ITM Web of Conferences
title Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_full Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_fullStr Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_full_unstemmed Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_short Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_sort comparative evaluation of mean cumulative regret in multi armed bandit algorithms etc ucb asymptotically optimal ucb and ts
url https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf
work_keys_str_mv AT leiyicong comparativeevaluationofmeancumulativeregretinmultiarmedbanditalgorithmsetcucbasymptoticallyoptimalucbandts