Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS

This research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB...

Full description

Saved in:

Bibliographic Details
Main Author:	Lei Yicong
Format:	Article
Language:	English
Published:	EDP Sciences 2025-01-01
Series:	ITM Web of Conferences
Online Access:	https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849705416962867200
author	Lei Yicong
author_facet	Lei Yicong
author_sort	Lei Yicong
collection	DOAJ
description	This research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB), Asymptotically Optimal UCB, and Thompson Sampling Algorithms (TS) – are selected to solve the MAB problem with numerical and categorical types. Different types represent different value intervals. Each algorithm is applied to each dataset with two different horizons, which represent the number of iterations, to evaluate its short-term and long-term decision-making ability. All algorithms are then utilized in each dataset to compare which one is most suitable for solving a certain type of MAB problem. This research provides an explicit introduction to the MAB problem and the four algorithms. Furthermore, it concludes that both Asymptotically Optimal UCB and TS are suitable for decision-making in the short and long term. At the same time, Asymptotically Optimal UCB is the most appropriate for the numerical MAB problem, while TS is the most appropriate for the categorical MAB problem. Additionally, UCB only suits short-term decision-making, ETC can be efficient only in the numerical MAB problem.
format	Article
id	doaj-art-b47ae3d35e7341edbb325e77de2f4db9
institution	DOAJ
issn	2271-2097
language	English
publishDate	2025-01-01
publisher	EDP Sciences
record_format	Article
series	ITM Web of Conferences
spelling	doaj-art-b47ae3d35e7341edbb325e77de2f4db92025-08-20T03:16:28ZengEDP SciencesITM Web of Conferences2271-20972025-01-01730102610.1051/itmconf/20257301026itmconf_iwadi2024_01026Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TSLei Yicong0School of Science and School of Engineering, Hong Kong University of Science and TechnologyThis research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB), Asymptotically Optimal UCB, and Thompson Sampling Algorithms (TS) – are selected to solve the MAB problem with numerical and categorical types. Different types represent different value intervals. Each algorithm is applied to each dataset with two different horizons, which represent the number of iterations, to evaluate its short-term and long-term decision-making ability. All algorithms are then utilized in each dataset to compare which one is most suitable for solving a certain type of MAB problem. This research provides an explicit introduction to the MAB problem and the four algorithms. Furthermore, it concludes that both Asymptotically Optimal UCB and TS are suitable for decision-making in the short and long term. At the same time, Asymptotically Optimal UCB is the most appropriate for the numerical MAB problem, while TS is the most appropriate for the categorical MAB problem. Additionally, UCB only suits short-term decision-making, ETC can be efficient only in the numerical MAB problem.https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf
spellingShingle	Lei Yicong Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS ITM Web of Conferences
title	Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_full	Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_fullStr	Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_full_unstemmed	Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_short	Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS
title_sort	comparative evaluation of mean cumulative regret in multi armed bandit algorithms etc ucb asymptotically optimal ucb and ts
url	https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf
work_keys_str_mv	AT leiyicong comparativeevaluationofmeancumulativeregretinmultiarmedbanditalgorithmsetcucbasymptoticallyoptimalucbandts

Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS

Similar Items