Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway
This study presents the first investigation into the problem of autonomous vehicle (AV) merging into existing platoons, proposing a multi-agent deep reinforcement learning (MA-DRL)-based cooperative control framework. The developed MA-DRL architecture enables coordinated learning among multiple auto...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | World Electric Vehicle Journal |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2032-6653/16/4/225 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849713726275452928 |
|---|---|
| author | Jiajia Chen Bingqing Zhu Mengyu Zhang Xiang Ling Xiaobo Ruan Yifan Deng Ning Guo |
| author_facet | Jiajia Chen Bingqing Zhu Mengyu Zhang Xiang Ling Xiaobo Ruan Yifan Deng Ning Guo |
| author_sort | Jiajia Chen |
| collection | DOAJ |
| description | This study presents the first investigation into the problem of autonomous vehicle (AV) merging into existing platoons, proposing a multi-agent deep reinforcement learning (MA-DRL)-based cooperative control framework. The developed MA-DRL architecture enables coordinated learning among multiple autonomous agents to address the multi-objective coordination challenge through synchronized control of platoon longitudinal acceleration, AV steering and acceleration. To enhance training efficiency, we develop a dual-layer multi-agent maximum Q-value proximal policy optimization (MAMQPPO) method, which extends the multi-agent PPO algorithm (a policy gradient method ensuring stable policy updates) by incorporating maximum Q-value action selection for platoon gap control and discrete command generation. This method simplifies the training process by using maximum Q-value action policy optimization to learn platoon gap selection and discrete action commands. Furthermore, a partially decoupled reward function (PD-Reward) is designed to properly guide the behavioral actions of both AVs and platoons while accelerating network convergence. Comprehensive highway simulation experiments show the proposed method reduces merging time by 37.69% (12.4 s vs. 19.9 s) and energy consumption by 58% (3.56 kWh vs. 8.47 kWh) compared to existing methods (the quintic polynomial-based + PID (Proportional–Integral–Differential)). |
| format | Article |
| id | doaj-art-c45e6456bbc9497ca0bd8d90ea666af6 |
| institution | DOAJ |
| issn | 2032-6653 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | World Electric Vehicle Journal |
| spelling | doaj-art-c45e6456bbc9497ca0bd8d90ea666af62025-08-20T03:13:54ZengMDPI AGWorld Electric Vehicle Journal2032-66532025-04-0116422510.3390/wevj16040225Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in HighwayJiajia Chen0Bingqing Zhu1Mengyu Zhang2Xiang Ling3Xiaobo Ruan4Yifan Deng5Ning Guo6School of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaSchool of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaHefei Communication Investment Holding Group Co., Ltd., Hefei 230009, ChinaSchool of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaSchool of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaSchool of Chang’an-Dublin International College of Transportation, Chang’an University, Xi’an 710064, ChinaSchool of Automotive and Transportation Engineering, Hefei University of Technology, Hefei 230009, ChinaThis study presents the first investigation into the problem of autonomous vehicle (AV) merging into existing platoons, proposing a multi-agent deep reinforcement learning (MA-DRL)-based cooperative control framework. The developed MA-DRL architecture enables coordinated learning among multiple autonomous agents to address the multi-objective coordination challenge through synchronized control of platoon longitudinal acceleration, AV steering and acceleration. To enhance training efficiency, we develop a dual-layer multi-agent maximum Q-value proximal policy optimization (MAMQPPO) method, which extends the multi-agent PPO algorithm (a policy gradient method ensuring stable policy updates) by incorporating maximum Q-value action selection for platoon gap control and discrete command generation. This method simplifies the training process by using maximum Q-value action policy optimization to learn platoon gap selection and discrete action commands. Furthermore, a partially decoupled reward function (PD-Reward) is designed to properly guide the behavioral actions of both AVs and platoons while accelerating network convergence. Comprehensive highway simulation experiments show the proposed method reduces merging time by 37.69% (12.4 s vs. 19.9 s) and energy consumption by 58% (3.56 kWh vs. 8.47 kWh) compared to existing methods (the quintic polynomial-based + PID (Proportional–Integral–Differential)).https://www.mdpi.com/2032-6653/16/4/225autonomous vehicleplatooning controldeep reinforcement learningmulti-agent systems |
| spellingShingle | Jiajia Chen Bingqing Zhu Mengyu Zhang Xiang Ling Xiaobo Ruan Yifan Deng Ning Guo Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway World Electric Vehicle Journal autonomous vehicle platooning control deep reinforcement learning multi-agent systems |
| title | Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway |
| title_full | Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway |
| title_fullStr | Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway |
| title_full_unstemmed | Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway |
| title_short | Multi-Agent Deep Reinforcement Learning Cooperative Control Model for Autonomous Vehicle Merging into Platoon in Highway |
| title_sort | multi agent deep reinforcement learning cooperative control model for autonomous vehicle merging into platoon in highway |
| topic | autonomous vehicle platooning control deep reinforcement learning multi-agent systems |
| url | https://www.mdpi.com/2032-6653/16/4/225 |
| work_keys_str_mv | AT jiajiachen multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway AT bingqingzhu multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway AT mengyuzhang multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway AT xiangling multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway AT xiaoboruan multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway AT yifandeng multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway AT ningguo multiagentdeepreinforcementlearningcooperativecontrolmodelforautonomousvehiclemergingintoplatooninhighway |