Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning
Maximizing the return on electric vehicle charging station (EVCS) operation helps to expand the EVCS, thus expanding the EV (electric vehicle) stock and better addressing climate change. However, in the face of dynamic regulation scenarios with large data, multiple variables, and low time scales, th...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2022-01-01
|
| Series: | International Transactions on Electrical Energy Systems |
| Online Access: | http://dx.doi.org/10.1155/2022/6854620 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850165084403269632 |
|---|---|
| author | Hengjie Li Jianghao Zhu Yun Zhou Qi Feng Donghan Feng |
| author_facet | Hengjie Li Jianghao Zhu Yun Zhou Qi Feng Donghan Feng |
| author_sort | Hengjie Li |
| collection | DOAJ |
| description | Maximizing the return on electric vehicle charging station (EVCS) operation helps to expand the EVCS, thus expanding the EV (electric vehicle) stock and better addressing climate change. However, in the face of dynamic regulation scenarios with large data, multiple variables, and low time scales, the existing regulation strategies aiming at maximizing EVCS returns many times fail to meet the demand. To handle increasingly complex regulation scenarios, a deep reinforcement learning algorithm (DRL) based on the improved twin delayed deep deterministic policy gradient (TD3) is used to construct basic energy management strategies in this paper. To enable the strategy to be more suitable for the goal of real-time energy regulation strategy, we used Thompson sampling strategy to improve TD3’s exploration noise sampling strategy, which greatly accelerated the initial convergence of TD3 during training. Also, we use marginalised importance sampling to calculate the Q-return function for TD3, which ensures that the constructed strategies are more likely to learn high-value experiences while having higher robustness. It is shown in numerical experiments that the charging station management strategy (CSMS) based on the modified TD3 obtains the fastest convergence speed and the highest robustness and achieves the largest operational returns compared to the CSMS constructed using deep deterministic policy gradient (DDPG), actor-critic using Kronecker-factored trust region (ACKTR), trust region policy optimization (TRPO), proximal policy optimization (PPO), soft actor-critic (SAC), and the original TD3. |
| format | Article |
| id | doaj-art-ec7d650448af47daa7ab83446a74bcd2 |
| institution | OA Journals |
| issn | 2050-7038 |
| language | English |
| publishDate | 2022-01-01 |
| publisher | Wiley |
| record_format | Article |
| series | International Transactions on Electrical Energy Systems |
| spelling | doaj-art-ec7d650448af47daa7ab83446a74bcd22025-08-20T02:21:49ZengWileyInternational Transactions on Electrical Energy Systems2050-70382022-01-01202210.1155/2022/6854620Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement LearningHengjie Li0Jianghao Zhu1Yun Zhou2Qi Feng3Donghan Feng4School of Electrical Engineering and Information EngineeringSchool of Electrical Engineering and Information EngineeringSchool of Electrical Engineering and Information EngineeringSchool of Electrical Engineering and Information EngineeringSchool of Electrical Engineering and Information EngineeringMaximizing the return on electric vehicle charging station (EVCS) operation helps to expand the EVCS, thus expanding the EV (electric vehicle) stock and better addressing climate change. However, in the face of dynamic regulation scenarios with large data, multiple variables, and low time scales, the existing regulation strategies aiming at maximizing EVCS returns many times fail to meet the demand. To handle increasingly complex regulation scenarios, a deep reinforcement learning algorithm (DRL) based on the improved twin delayed deep deterministic policy gradient (TD3) is used to construct basic energy management strategies in this paper. To enable the strategy to be more suitable for the goal of real-time energy regulation strategy, we used Thompson sampling strategy to improve TD3’s exploration noise sampling strategy, which greatly accelerated the initial convergence of TD3 during training. Also, we use marginalised importance sampling to calculate the Q-return function for TD3, which ensures that the constructed strategies are more likely to learn high-value experiences while having higher robustness. It is shown in numerical experiments that the charging station management strategy (CSMS) based on the modified TD3 obtains the fastest convergence speed and the highest robustness and achieves the largest operational returns compared to the CSMS constructed using deep deterministic policy gradient (DDPG), actor-critic using Kronecker-factored trust region (ACKTR), trust region policy optimization (TRPO), proximal policy optimization (PPO), soft actor-critic (SAC), and the original TD3.http://dx.doi.org/10.1155/2022/6854620 |
| spellingShingle | Hengjie Li Jianghao Zhu Yun Zhou Qi Feng Donghan Feng Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning International Transactions on Electrical Energy Systems |
| title | Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning |
| title_full | Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning |
| title_fullStr | Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning |
| title_full_unstemmed | Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning |
| title_short | Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning |
| title_sort | charging station management strategy for returns maximization via improved td3 deep reinforcement learning |
| url | http://dx.doi.org/10.1155/2022/6854620 |
| work_keys_str_mv | AT hengjieli chargingstationmanagementstrategyforreturnsmaximizationviaimprovedtd3deepreinforcementlearning AT jianghaozhu chargingstationmanagementstrategyforreturnsmaximizationviaimprovedtd3deepreinforcementlearning AT yunzhou chargingstationmanagementstrategyforreturnsmaximizationviaimprovedtd3deepreinforcementlearning AT qifeng chargingstationmanagementstrategyforreturnsmaximizationviaimprovedtd3deepreinforcementlearning AT donghanfeng chargingstationmanagementstrategyforreturnsmaximizationviaimprovedtd3deepreinforcementlearning |