Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection

Abstract A mixed adaptive dynamic programming (ADP) scheme based on zero‐sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the pro...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jun Ye, Hongbo Gao, Manjiang Hu, Yougang Bian, Qingjia Cui, Xiaohui Qin, Rongjun Ding
Format:	Article
Language:	English
Published:	Wiley 2025-04-01
Series:	CAAI Transactions on Intelligence Technology
Subjects:	adaptive dynamic programming autonomous underwater vehicle game theory optimal control reinforcement learning
Online Access:	https://doi.org/10.1049/cit2.12372
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850178743218208768
author	Jun Ye Hongbo Gao Manjiang Hu Yougang Bian Qingjia Cui Xiaohui Qin Rongjun Ding
author_facet	Jun Ye Hongbo Gao Manjiang Hu Yougang Bian Qingjia Cui Xiaohui Qin Rongjun Ding
author_sort	Jun Ye
collection	DOAJ
description	Abstract A mixed adaptive dynamic programming (ADP) scheme based on zero‐sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm. Initially, the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias. Also, the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset. To comprehensively leverage the advantages of model‐based and model‐free methods during training, an adaptive tuning factor is introduced based on the dataset that possesses model‐referenced information and conforms to the distribution of the real‐world environment, which balances the influence of model‐based control law and data‐driven policy gradient on the direction of policy improvement. As a result, the proposed approach accelerates the learning speed compared to data‐driven methods, concurrently also enhancing the tracking performance in comparison to model‐based control methods. Moreover, the optimal control problem under disturbances is formulated as a zero‐sum game, and the actor‐critic‐disturbance framework is introduced to approximate the optimal control input, cost function, and disturbance policy, respectively. Furthermore, the convergence property of the proposed algorithm based on the value iteration method is analysed. Finally, an example of AUV path following based on the improved line‐of‐sight guidance is presented to demonstrate the effectiveness of the proposed method.
format	Article
id	doaj-art-54c4d7eca72d4e3aa3d6ba7fe411efa6
institution	OA Journals
issn	2468-2322
language	English
publishDate	2025-04-01
publisher	Wiley
record_format	Article
series	CAAI Transactions on Intelligence Technology
spelling	doaj-art-54c4d7eca72d4e3aa3d6ba7fe411efa62025-08-20T02:18:39ZengWileyCAAI Transactions on Intelligence Technology2468-23222025-04-0110251052810.1049/cit2.12372Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejectionJun Ye0Hongbo Gao1Manjiang Hu2Yougang Bian3Qingjia Cui4Xiaohui Qin5Rongjun Ding6State Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaSchool of Information Science and Technology, and Institute of Advanced Technology University of Science and Technology of China Hefei ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaAbstract A mixed adaptive dynamic programming (ADP) scheme based on zero‐sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm. Initially, the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias. Also, the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset. To comprehensively leverage the advantages of model‐based and model‐free methods during training, an adaptive tuning factor is introduced based on the dataset that possesses model‐referenced information and conforms to the distribution of the real‐world environment, which balances the influence of model‐based control law and data‐driven policy gradient on the direction of policy improvement. As a result, the proposed approach accelerates the learning speed compared to data‐driven methods, concurrently also enhancing the tracking performance in comparison to model‐based control methods. Moreover, the optimal control problem under disturbances is formulated as a zero‐sum game, and the actor‐critic‐disturbance framework is introduced to approximate the optimal control input, cost function, and disturbance policy, respectively. Furthermore, the convergence property of the proposed algorithm based on the value iteration method is analysed. Finally, an example of AUV path following based on the improved line‐of‐sight guidance is presented to demonstrate the effectiveness of the proposed method.https://doi.org/10.1049/cit2.12372adaptive dynamic programmingautonomous underwater vehiclegame theoryoptimal controlreinforcement learning
spellingShingle	Jun Ye Hongbo Gao Manjiang Hu Yougang Bian Qingjia Cui Xiaohui Qin Rongjun Ding Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection CAAI Transactions on Intelligence Technology adaptive dynamic programming autonomous underwater vehicle game theory optimal control reinforcement learning
title	Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_full	Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_fullStr	Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_full_unstemmed	Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_short	Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_sort	learning based tracking control of auv mixed policy improvement and game based disturbance rejection
topic	adaptive dynamic programming autonomous underwater vehicle game theory optimal control reinforcement learning
url	https://doi.org/10.1049/cit2.12372
work_keys_str_mv	AT junye learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection AT hongbogao learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection AT manjianghu learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection AT yougangbian learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection AT qingjiacui learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection AT xiaohuiqin learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection AT rongjunding learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection

Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection

Similar Items