Risk-Aware Reinforcement Learning Framework for User-Centric O-RAN

The evolution of Open Radio Access Networks (O-RAN) presents an opportunity to enhance network performance by enabling dynamic orchestration of configuration and optimization parameters (COPs) through online learning methods. However, leveraging this potential requires overcoming the limitations of...

Full description

Saved in:
Bibliographic Details
Main Authors: Shahrukh Khan Kasi, Fahd Ahmed Khan, Sabit Ekin, Ali Imran
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Transactions on Machine Learning in Communications and Networking
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10852269/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The evolution of Open Radio Access Networks (O-RAN) presents an opportunity to enhance network performance by enabling dynamic orchestration of configuration and optimization parameters (COPs) through online learning methods. However, leveraging this potential requires overcoming the limitations of traditional cell-centric RAN architectures, which lack the necessary flexibility. On the other hand, despite their recent popularity, the practical deployment of online learning frameworks, such as Deep Reinforcement Learning (DRL)-based COP optimization solutions, remains limited due to their risk of deteriorating network performance during the exploration phase. In this article, we propose and analyze a novel risk-aware DRL framework for user-centric RAN (UC-RAN), which offers both the architectural flexibility and COP optimization to exploit this flexibility. We investigate and identify UC-RAN COPs that can be optimized via a soft actor-critic algorithm implementable as an O-RAN application (rApp) to jointly maximize latency satisfaction, reliability satisfaction, area spectral efficiency, and energy efficiency. We use the offline learning on UC-RAN to reliably accelerate DRL training, thus minimizing the risk of DRL deteriorating cellular network performance. Results show that our proposed solution approaches near-optimal performance in just a few hundred iterations with a decrease in risk score by a factor of ten.
ISSN:2831-316X