Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems
This paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative rewar...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10772464/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative reward. This can only be achieved by considering the impact of current actions on the future evolution of states. The uncertainty about the future availability of arms and partial state-information makes this objective challenging. CRMABs can be applied to resource allocation problems in cyber-physical systems, including sensor/relay scheduling. Whittle’s index policy, online rollout, and myopic policies are studied as solutions for CRMABs. First, the conditions for the applicability of Whittle’s index policy are studied, and the indexability result is claimed under some structural assumptions. An algorithm for index computation is presented. The online rollout policy for partially observable CRMABs is proposed as a low-complexity alternative to the index policy, and the complexity of these schemes is analyzed. An upper bound on the optimal value function is derived, which helps assess the sub-optimality of various solutions. The simulation study compares the performance of these policies and shows that the rollout policy is the better performing solution. In some settings it shows about 14% gain relative to Whittle’s index and myopic policies. Finally, an application to the problem of wildfire management is presented. Decision-making using CRMABs is analyzed from the perspective of a central agency tasked with fighting wildfires in multiple regions under logistic constraints. |
|---|---|
| ISSN: | 2169-3536 |