Design of AUV controller based on improved PPO algorithm

ObjectiveIn order to improve the robustness of autonomous underwater vehicle (AUV) controllers to environment modeling errors, this paper proposes a reinforcement learning control strategy that introduces contextual information and a course-learning training mechanism. MethodFirst, the contextual in...

Full description

Saved in:
Bibliographic Details
Main Authors: Desheng XU, Chunhui XU
Format: Article
Language:English
Published: Editorial Office of Chinese Journal of Ship Research 2025-02-01
Series:Zhongguo Jianchuan Yanjiu
Subjects:
Online Access:http://www.ship-research.com/en/article/doi/10.19693/j.issn.1673-3185.04031
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850045930685857792
author Desheng XU
Chunhui XU
author_facet Desheng XU
Chunhui XU
author_sort Desheng XU
collection DOAJ
description ObjectiveIn order to improve the robustness of autonomous underwater vehicle (AUV) controllers to environment modeling errors, this paper proposes a reinforcement learning control strategy that introduces contextual information and a course-learning training mechanism. MethodFirst, the contextual information is embedded into the policy network using the interaction history data as part of the policy network input; second, the course-learning training mechanism is designed to gradually increase the interference strength during the training process to avoid training instability and early stopping phenomenon caused by too much interference. Fixed-depth control experiments are conducted in a simulation environment, and the effectiveness of the algorithm is further verified using a real AUV in a tank. ResultsThe experimental results show that the proposed algorithm can improve the convergence speed by 25.00% and the reward steady state value by 10.81%, effectively improving the training process. The proposed algorithm can realize static-free tracking in the simulation environment. In the tank experiment, compared with the domain randomization algorithm and baseline algorithm, the average depth position tracking error of our method was reduced by 45.81% and 63.00% respectively, and the standard deviation was reduced by 36.17% and 52.76% respectively, effectively improving tracking accuracy and stability. ConclusionThe research results can provide useful references for the application of deep reinforcement learning methods in the field of AUV control.
format Article
id doaj-art-373a211f2b764083a631dc1a4305a566
institution DOAJ
issn 1673-3185
language English
publishDate 2025-02-01
publisher Editorial Office of Chinese Journal of Ship Research
record_format Article
series Zhongguo Jianchuan Yanjiu
spelling doaj-art-373a211f2b764083a631dc1a4305a5662025-08-20T02:54:35ZengEditorial Office of Chinese Journal of Ship ResearchZhongguo Jianchuan Yanjiu1673-31852025-02-0120135035910.19693/j.issn.1673-3185.04031ZG4031Design of AUV controller based on improved PPO algorithmDesheng XU0Chunhui XU1State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, ChinaState Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, ChinaObjectiveIn order to improve the robustness of autonomous underwater vehicle (AUV) controllers to environment modeling errors, this paper proposes a reinforcement learning control strategy that introduces contextual information and a course-learning training mechanism. MethodFirst, the contextual information is embedded into the policy network using the interaction history data as part of the policy network input; second, the course-learning training mechanism is designed to gradually increase the interference strength during the training process to avoid training instability and early stopping phenomenon caused by too much interference. Fixed-depth control experiments are conducted in a simulation environment, and the effectiveness of the algorithm is further verified using a real AUV in a tank. ResultsThe experimental results show that the proposed algorithm can improve the convergence speed by 25.00% and the reward steady state value by 10.81%, effectively improving the training process. The proposed algorithm can realize static-free tracking in the simulation environment. In the tank experiment, compared with the domain randomization algorithm and baseline algorithm, the average depth position tracking error of our method was reduced by 45.81% and 63.00% respectively, and the standard deviation was reduced by 36.17% and 52.76% respectively, effectively improving tracking accuracy and stability. ConclusionThe research results can provide useful references for the application of deep reinforcement learning methods in the field of AUV control.http://www.ship-research.com/en/article/doi/10.19693/j.issn.1673-3185.04031autonomous underwater vehiclescontrollersreinforcement learningcourse learningcontext variables
spellingShingle Desheng XU
Chunhui XU
Design of AUV controller based on improved PPO algorithm
Zhongguo Jianchuan Yanjiu
autonomous underwater vehicles
controllers
reinforcement learning
course learning
context variables
title Design of AUV controller based on improved PPO algorithm
title_full Design of AUV controller based on improved PPO algorithm
title_fullStr Design of AUV controller based on improved PPO algorithm
title_full_unstemmed Design of AUV controller based on improved PPO algorithm
title_short Design of AUV controller based on improved PPO algorithm
title_sort design of auv controller based on improved ppo algorithm
topic autonomous underwater vehicles
controllers
reinforcement learning
course learning
context variables
url http://www.ship-research.com/en/article/doi/10.19693/j.issn.1673-3185.04031
work_keys_str_mv AT deshengxu designofauvcontrollerbasedonimprovedppoalgorithm
AT chunhuixu designofauvcontrollerbasedonimprovedppoalgorithm