Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems

This paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative rewar...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kesav Ram Kaza, Rahul H. Meshram, Varunkumar Mehta, Shabbir N. Merchant
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Markov decision processes restless multi-armed bandits dynamic scheduling cyber-physical systems reinforcement learning sequential decision models
Online Access:	https://ieeexplore.ieee.org/document/10772464/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative reward. This can only be achieved by considering the impact of current actions on the future evolution of states. The uncertainty about the future availability of arms and partial state-information makes this objective challenging. CRMABs can be applied to resource allocation problems in cyber-physical systems, including sensor/relay scheduling. Whittle’s index policy, online rollout, and myopic policies are studied as solutions for CRMABs. First, the conditions for the applicability of Whittle’s index policy are studied, and the indexability result is claimed under some structural assumptions. An algorithm for index computation is presented. The online rollout policy for partially observable CRMABs is proposed as a low-complexity alternative to the index policy, and the complexity of these schemes is analyzed. An upper bound on the optimal value function is derived, which helps assess the sub-optimality of various solutions. The simulation study compares the performance of these policies and shows that the rollout policy is the better performing solution. In some settings it shows about 14% gain relative to Whittle’s index and myopic policies. Finally, an application to the problem of wildfire management is presented. Decision-making using CRMABs is analyzed from the perspective of a central agency tasked with fighting wildfires in multiple regions under logistic constraints.
ISSN:	2169-3536

Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems

Similar Items