Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway

This study presents the first investigation into the problem of autonomous vehicle (AV) merging into existing platoons, proposing a multi-agent deep reinforcement learning (MA-DRL)-based cooperative control framework. The developed MA-DRL architecture enables coordinated learning among multiple auto...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiajia Chen, Bingqing Zhu, Mengyu Zhang, Xiang Ling, Xiaobo Ruan, Yifan Deng, Ning Guo
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:World Electric Vehicle Journal
Subjects:
Online Access:https://www.mdpi.com/2032-6653/16/4/225
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849713726275452928
author Jiajia Chen
Bingqing Zhu
Mengyu Zhang
Xiang Ling
Xiaobo Ruan
Yifan Deng
Ning Guo
author_facet Jiajia Chen
Bingqing Zhu
Mengyu Zhang
Xiang Ling
Xiaobo Ruan
Yifan Deng
Ning Guo
author_sort Jiajia Chen
collection DOAJ
description This study presents the first investigation into the problem of autonomous vehicle (AV) merging into existing platoons, proposing a multi-agent deep reinforcement learning (MA-DRL)-based cooperative control framework. The developed MA-DRL architecture enables coordinated learning among multiple autonomous agents to address the multi-objective coordination challenge through synchronized control of platoon longitudinal acceleration, AV steering and acceleration. To enhance training efficiency, we develop a dual-layer multi-agent maximum Q-value proximal policy optimization (MAMQPPO) method, which extends the multi-agent PPO algorithm (a policy gradient method ensuring stable policy updates) by incorporating maximum Q-value action selection for platoon gap control and discrete command generation. This method simplifies the training process by using maximum Q-value action policy optimization to learn platoon gap selection and discrete action commands. Furthermore, a partially decoupled reward function (PD-Reward) is designed to properly guide the behavioral actions of both AVs and platoons while accelerating network convergence. Comprehensive highway simulation experiments show the proposed method reduces merging time by 37.69% (12.4 s vs. 19.9 s) and energy consumption by 58% (3.56 kWh vs. 8.47 kWh) compared to existing methods (the quintic polynomial-based + PID (Proportional–Integral–Differential)).
format Article
id doaj-art-c45e6456bbc9497ca0bd8d90ea666af6
institution DOAJ
issn 2032-6653
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series World Electric Vehicle Journal
spelling doaj-art-c45e6456bbc9497ca0bd8d90ea666af62025-08-20T03:13:54ZengMDPI AGWorld Electric Vehicle Journal2032-66532025-04-0116422510.3390/wevj16040225Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in HighwayJiajia Chen0Bingqing Zhu1Mengyu Zhang2Xiang Ling3Xiaobo Ruan4Yifan Deng5Ning Guo6School of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaSchool of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaHefei Communication Investment Holding Group Co., Ltd., Hefei 230009, ChinaSchool of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaSchool of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaSchool of Chang’an-Dublin International College of Transportation, Chang’an University, Xi’an 710064, ChinaSchool of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaThis study presents the first investigation into the problem of autonomous vehicle (AV) merging into existing platoons, proposing a multi-agent deep reinforcement learning (MA-DRL)-based cooperative control framework. The developed MA-DRL architecture enables coordinated learning among multiple autonomous agents to address the multi-objective coordination challenge through synchronized control of platoon longitudinal acceleration, AV steering and acceleration. To enhance training efficiency, we develop a dual-layer multi-agent maximum Q-value proximal policy optimization (MAMQPPO) method, which extends the multi-agent PPO algorithm (a policy gradient method ensuring stable policy updates) by incorporating maximum Q-value action selection for platoon gap control and discrete command generation. This method simplifies the training process by using maximum Q-value action policy optimization to learn platoon gap selection and discrete action commands. Furthermore, a partially decoupled reward function (PD-Reward) is designed to properly guide the behavioral actions of both AVs and platoons while accelerating network convergence. Comprehensive highway simulation experiments show the proposed method reduces merging time by 37.69% (12.4 s vs. 19.9 s) and energy consumption by 58% (3.56 kWh vs. 8.47 kWh) compared to existing methods (the quintic polynomial-based + PID (Proportional–Integral–Differential)).https://www.mdpi.com/2032-6653/16/4/225autonomous vehicleplatooning controldeep reinforcement learningmulti-agent systems
spellingShingle Jiajia Chen
Bingqing Zhu
Mengyu Zhang
Xiang Ling
Xiaobo Ruan
Yifan Deng
Ning Guo
Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway
World Electric Vehicle Journal
autonomous vehicle
platooning control
deep reinforcement learning
multi-agent systems
title Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway
title_full Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway
title_fullStr Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway
title_full_unstemmed Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway
title_short Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway
title_sort multi agent deep reinforcement learning cooperative control model for autonomous vehicle merging into platoon in highway
topic autonomous vehicle
platooning control
deep reinforcement learning
multi-agent systems
url https://www.mdpi.com/2032-6653/16/4/225
work_keys_str_mv AT jiajiachen multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway
AT bingqingzhu multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway
AT mengyuzhang multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway
AT xiangling multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway
AT xiaoboruan multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway
AT yifandeng multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway
AT ningguo multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway