Federated deep reinforcement learning-based urban traffic signal optimal control

Abstract This paper proposes a cross-domain intelligent traffic signal control method based on federated Proximal-Policy Optimization (PPO) for distributed joint training of agents across domains for typical intersections, aiming at solving the problems of slow learning speed and poor model generali...

Full description

Saved in:
Bibliographic Details
Main Authors: Mi Li, Xiaolong Pan, Chuhui Liu, Zirui Li
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-91966-1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract This paper proposes a cross-domain intelligent traffic signal control method based on federated Proximal-Policy Optimization (PPO) for distributed joint training of agents across domains for typical intersections, aiming at solving the problems of slow learning speed and poor model generalization when deep reinforcement learning (RL) is applied to cross-domain multi-intersection traffic signal optimization control. The proposed method improves the model generalization ability of different local models during global cross-region distributed joint training under the premise of ensuring information security and data privacy, solves the problem of non-independent and homogeneous distribution of environmental data faced by different agents in real intersection scenarios, and significantly accelerates the convergence speed of the model training phase. By reasonably designing the state, action and reward functions and determining the optimal values of several key parameters in the federated collaboration mechanism, the RL model could ensure high learning efficiency and fast convergence in the face of the gradual increase of road network size and the exponential increase of state and action space with the number of intersections. In addition, the new state interaction method and the reward function allow the agents to collaborate with each other, which greatly improves the information interaction efficiency between the federated learning local agents and the central coordinator, and improves the access efficiency of the road network and reduces the amount of communication data transmitted. Finally, through experimental comparisons, the proposed method can significantly reduce the average vehicle waiting time by up to 27.34% compared with the existing fixed timing method, and under the same convergence height, the convergence speed is up to 47.69% faster compared with the individual PPO trained in a single local environment, and up to 45.35% faster than the aggregated PPO trained jointly using all local data. The proposed method effectively optimizes intersection access efficiency with excellent robustness under various traffic flow settings.
ISSN:2045-2322