An improved hybrid policy optimization method for economic-preference dispatch considering cross time-scales collaboration

In order to achieve real-time active power dispatch (RAPD) efficiently and economically in a large-scale power system with high proportion of renewable energy, deep reinforcement learning (DRL) based dispatching agent is constructed for dynamic decision-making of RAPD. However, existing DRL methods...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qianli Zhang, Hao Tang, Duanchao Li
Format:	Article
Language:	English
Published:	Elsevier 2025-08-01
Series:	International Journal of Electrical Power & Energy Systems
Subjects:	Real-time active power dispatch Data drive optimization Operation preference Deep reinforcement learning
Online Access:	http://www.sciencedirect.com/science/article/pii/S014206152500300X
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In order to achieve real-time active power dispatch (RAPD) efficiently and economically in a large-scale power system with high proportion of renewable energy, deep reinforcement learning (DRL) based dispatching agent is constructed for dynamic decision-making of RAPD. However, existing DRL methods have a deficiency in reasonably responding to the relatively long time-scale power schedule at the dispatching level, which undermines the cooperativity between the expected schedule and RAPD and also results in the incapability of well-tracking the operation preferences. In this paper, we present an improved DRL algorithm, called distributed partial-modulation proximal policy optimization (DPMPPO), to address this issue. In DPMPPO, a deep modulation network (DMN) is integrated into the proximal policy optimization (PPO) module, where DMN is designed to capture the operation preferences from historical operation data and correct the policy generated by the PPO module through specific partial modulation mechanism. In addition, we devise a distributed learning architecture suitable for DPMPPO to improve the efficiency of network training. The experiment is conducted in a modified IEEE-300 case. Compared with the existing DRL methods, the proposed DPMPPO allows the optimized dispatching policy to have reasonable cross-time-scale collaboration with the schedules and achieve the online decision-making of RAPD flexibly and economically.
ISSN:	0142-0615

An improved hybrid policy optimization method for economic-preference dispatch considering cross time-scales collaboration

Similar Items