Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems
This paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative rewar...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10772464/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850264744159608832 |
|---|---|
| author | Kesav Ram Kaza Rahul H. Meshram Varunkumar Mehta Shabbir N. Merchant |
| author_facet | Kesav Ram Kaza Rahul H. Meshram Varunkumar Mehta Shabbir N. Merchant |
| author_sort | Kesav Ram Kaza |
| collection | DOAJ |
| description | This paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative reward. This can only be achieved by considering the impact of current actions on the future evolution of states. The uncertainty about the future availability of arms and partial state-information makes this objective challenging. CRMABs can be applied to resource allocation problems in cyber-physical systems, including sensor/relay scheduling. Whittle’s index policy, online rollout, and myopic policies are studied as solutions for CRMABs. First, the conditions for the applicability of Whittle’s index policy are studied, and the indexability result is claimed under some structural assumptions. An algorithm for index computation is presented. The online rollout policy for partially observable CRMABs is proposed as a low-complexity alternative to the index policy, and the complexity of these schemes is analyzed. An upper bound on the optimal value function is derived, which helps assess the sub-optimality of various solutions. The simulation study compares the performance of these policies and shows that the rollout policy is the better performing solution. In some settings it shows about 14% gain relative to Whittle’s index and myopic policies. Finally, an application to the problem of wildfire management is presented. Decision-making using CRMABs is analyzed from the perspective of a central agency tasked with fighting wildfires in multiple regions under logistic constraints. |
| format | Article |
| id | doaj-art-d39e730e59ef4949825a2f1fddcd96b4 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-d39e730e59ef4949825a2f1fddcd96b42025-08-20T01:54:38ZengIEEEIEEE Access2169-35362024-01-011218227418229510.1109/ACCESS.2024.351055810772464Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical SystemsKesav Ram Kaza0https://orcid.org/0000-0002-9051-4624Rahul H. Meshram1Varunkumar Mehta2https://orcid.org/0000-0002-5087-8175Shabbir N. Merchant3https://orcid.org/0000-0002-9119-6795School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, CanadaDepartment of Electrical Engineering, Indian Institute of Technology Madras, Chennai, IndiaAerospace Research Centre, National Research Council Canada, Ottawa, ON, CanadaDepartment of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, IndiaThis paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative reward. This can only be achieved by considering the impact of current actions on the future evolution of states. The uncertainty about the future availability of arms and partial state-information makes this objective challenging. CRMABs can be applied to resource allocation problems in cyber-physical systems, including sensor/relay scheduling. Whittle’s index policy, online rollout, and myopic policies are studied as solutions for CRMABs. First, the conditions for the applicability of Whittle’s index policy are studied, and the indexability result is claimed under some structural assumptions. An algorithm for index computation is presented. The online rollout policy for partially observable CRMABs is proposed as a low-complexity alternative to the index policy, and the complexity of these schemes is analyzed. An upper bound on the optimal value function is derived, which helps assess the sub-optimality of various solutions. The simulation study compares the performance of these policies and shows that the rollout policy is the better performing solution. In some settings it shows about 14% gain relative to Whittle’s index and myopic policies. Finally, an application to the problem of wildfire management is presented. Decision-making using CRMABs is analyzed from the perspective of a central agency tasked with fighting wildfires in multiple regions under logistic constraints.https://ieeexplore.ieee.org/document/10772464/Markov decision processesrestless multi-armed banditsdynamic schedulingcyber-physical systemsreinforcement learningsequential decision models |
| spellingShingle | Kesav Ram Kaza Rahul H. Meshram Varunkumar Mehta Shabbir N. Merchant Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems IEEE Access Markov decision processes restless multi-armed bandits dynamic scheduling cyber-physical systems reinforcement learning sequential decision models |
| title | Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems |
| title_full | Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems |
| title_fullStr | Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems |
| title_full_unstemmed | Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems |
| title_short | Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems |
| title_sort | constrained restless bandits for dynamic scheduling in cyber physical systems |
| topic | Markov decision processes restless multi-armed bandits dynamic scheduling cyber-physical systems reinforcement learning sequential decision models |
| url | https://ieeexplore.ieee.org/document/10772464/ |
| work_keys_str_mv | AT kesavramkaza constrainedrestlessbanditsfordynamicschedulingincyberphysicalsystems AT rahulhmeshram constrainedrestlessbanditsfordynamicschedulingincyberphysicalsystems AT varunkumarmehta constrainedrestlessbanditsfordynamicschedulingincyberphysicalsystems AT shabbirnmerchant constrainedrestlessbanditsfordynamicschedulingincyberphysicalsystems |