Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems

This paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative rewar...

Full description

Saved in:
Bibliographic Details
Main Authors: Kesav Ram Kaza, Rahul H. Meshram, Varunkumar Mehta, Shabbir N. Merchant
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10772464/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850264744159608832
author Kesav Ram Kaza
Rahul H. Meshram
Varunkumar Mehta
Shabbir N. Merchant
author_facet Kesav Ram Kaza
Rahul H. Meshram
Varunkumar Mehta
Shabbir N. Merchant
author_sort Kesav Ram Kaza
collection DOAJ
description This paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative reward. This can only be achieved by considering the impact of current actions on the future evolution of states. The uncertainty about the future availability of arms and partial state-information makes this objective challenging. CRMABs can be applied to resource allocation problems in cyber-physical systems, including sensor/relay scheduling. Whittle’s index policy, online rollout, and myopic policies are studied as solutions for CRMABs. First, the conditions for the applicability of Whittle’s index policy are studied, and the indexability result is claimed under some structural assumptions. An algorithm for index computation is presented. The online rollout policy for partially observable CRMABs is proposed as a low-complexity alternative to the index policy, and the complexity of these schemes is analyzed. An upper bound on the optimal value function is derived, which helps assess the sub-optimality of various solutions. The simulation study compares the performance of these policies and shows that the rollout policy is the better performing solution. In some settings it shows about 14% gain relative to Whittle’s index and myopic policies. Finally, an application to the problem of wildfire management is presented. Decision-making using CRMABs is analyzed from the perspective of a central agency tasked with fighting wildfires in multiple regions under logistic constraints.
format Article
id doaj-art-d39e730e59ef4949825a2f1fddcd96b4
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-d39e730e59ef4949825a2f1fddcd96b42025-08-20T01:54:38ZengIEEEIEEE Access2169-35362024-01-011218227418229510.1109/ACCESS.2024.351055810772464Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical SystemsKesav Ram Kaza0https://orcid.org/0000-0002-9051-4624Rahul H. Meshram1Varunkumar Mehta2https://orcid.org/0000-0002-5087-8175Shabbir N. Merchant3https://orcid.org/0000-0002-9119-6795School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, CanadaDepartment of Electrical Engineering, Indian Institute of Technology Madras, Chennai, IndiaAerospace Research Centre, National Research Council Canada, Ottawa, ON, CanadaDepartment of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, IndiaThis paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative reward. This can only be achieved by considering the impact of current actions on the future evolution of states. The uncertainty about the future availability of arms and partial state-information makes this objective challenging. CRMABs can be applied to resource allocation problems in cyber-physical systems, including sensor/relay scheduling. Whittle’s index policy, online rollout, and myopic policies are studied as solutions for CRMABs. First, the conditions for the applicability of Whittle’s index policy are studied, and the indexability result is claimed under some structural assumptions. An algorithm for index computation is presented. The online rollout policy for partially observable CRMABs is proposed as a low-complexity alternative to the index policy, and the complexity of these schemes is analyzed. An upper bound on the optimal value function is derived, which helps assess the sub-optimality of various solutions. The simulation study compares the performance of these policies and shows that the rollout policy is the better performing solution. In some settings it shows about 14% gain relative to Whittle’s index and myopic policies. Finally, an application to the problem of wildfire management is presented. Decision-making using CRMABs is analyzed from the perspective of a central agency tasked with fighting wildfires in multiple regions under logistic constraints.https://ieeexplore.ieee.org/document/10772464/Markov decision processesrestless multi-armed banditsdynamic schedulingcyber-physical systemsreinforcement learningsequential decision models
spellingShingle Kesav Ram Kaza
Rahul H. Meshram
Varunkumar Mehta
Shabbir N. Merchant
Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems
IEEE Access
Markov decision processes
restless multi-armed bandits
dynamic scheduling
cyber-physical systems
reinforcement learning
sequential decision models
title Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems
title_full Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems
title_fullStr Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems
title_full_unstemmed Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems
title_short Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems
title_sort constrained restless bandits for dynamic scheduling in cyber physical systems
topic Markov decision processes
restless multi-armed bandits
dynamic scheduling
cyber-physical systems
reinforcement learning
sequential decision models
url https://ieeexplore.ieee.org/document/10772464/
work_keys_str_mv AT kesavramkaza constrainedrestlessbanditsfordynamicschedulingincyberphysicalsystems
AT rahulhmeshram constrainedrestlessbanditsfordynamicschedulingincyberphysicalsystems
AT varunkumarmehta constrainedrestlessbanditsfordynamicschedulingincyberphysicalsystems
AT shabbirnmerchant constrainedrestlessbanditsfordynamicschedulingincyberphysicalsystems