Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
This research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
EDP Sciences
2025-01-01
|
| Series: | ITM Web of Conferences |
| Online Access: | https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849705416962867200 |
|---|---|
| author | Lei Yicong |
| author_facet | Lei Yicong |
| author_sort | Lei Yicong |
| collection | DOAJ |
| description | This research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB), Asymptotically Optimal UCB, and Thompson Sampling Algorithms (TS) – are selected to solve the MAB problem with numerical and categorical types. Different types represent different value intervals. Each algorithm is applied to each dataset with two different horizons, which represent the number of iterations, to evaluate its short-term and long-term decision-making ability. All algorithms are then utilized in each dataset to compare which one is most suitable for solving a certain type of MAB problem. This research provides an explicit introduction to the MAB problem and the four algorithms. Furthermore, it concludes that both Asymptotically Optimal UCB and TS are suitable for decision-making in the short and long term. At the same time, Asymptotically Optimal UCB is the most appropriate for the numerical MAB problem, while TS is the most appropriate for the categorical MAB problem. Additionally, UCB only suits short-term decision-making, ETC can be efficient only in the numerical MAB problem. |
| format | Article |
| id | doaj-art-b47ae3d35e7341edbb325e77de2f4db9 |
| institution | DOAJ |
| issn | 2271-2097 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | EDP Sciences |
| record_format | Article |
| series | ITM Web of Conferences |
| spelling | doaj-art-b47ae3d35e7341edbb325e77de2f4db92025-08-20T03:16:28ZengEDP SciencesITM Web of Conferences2271-20972025-01-01730102610.1051/itmconf/20257301026itmconf_iwadi2024_01026Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TSLei Yicong0School of Science and School of Engineering, Hong Kong University of Science and TechnologyThis research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB), Asymptotically Optimal UCB, and Thompson Sampling Algorithms (TS) – are selected to solve the MAB problem with numerical and categorical types. Different types represent different value intervals. Each algorithm is applied to each dataset with two different horizons, which represent the number of iterations, to evaluate its short-term and long-term decision-making ability. All algorithms are then utilized in each dataset to compare which one is most suitable for solving a certain type of MAB problem. This research provides an explicit introduction to the MAB problem and the four algorithms. Furthermore, it concludes that both Asymptotically Optimal UCB and TS are suitable for decision-making in the short and long term. At the same time, Asymptotically Optimal UCB is the most appropriate for the numerical MAB problem, while TS is the most appropriate for the categorical MAB problem. Additionally, UCB only suits short-term decision-making, ETC can be efficient only in the numerical MAB problem.https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf |
| spellingShingle | Lei Yicong Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS ITM Web of Conferences |
| title | Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS |
| title_full | Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS |
| title_fullStr | Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS |
| title_full_unstemmed | Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS |
| title_short | Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS |
| title_sort | comparative evaluation of mean cumulative regret in multi armed bandit algorithms etc ucb asymptotically optimal ucb and ts |
| url | https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf |
| work_keys_str_mv | AT leiyicong comparativeevaluationofmeancumulativeregretinmultiarmedbanditalgorithmsetcucbasymptoticallyoptimalucbandts |