Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection

Abstract A mixed adaptive dynamic programming (ADP) scheme based on zero‐sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the pro...

Full description

Saved in:
Bibliographic Details
Main Authors: Jun Ye, Hongbo Gao, Manjiang Hu, Yougang Bian, Qingjia Cui, Xiaohui Qin, Rongjun Ding
Format: Article
Language:English
Published: Wiley 2025-04-01
Series:CAAI Transactions on Intelligence Technology
Subjects:
Online Access:https://doi.org/10.1049/cit2.12372
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850178743218208768
author Jun Ye
Hongbo Gao
Manjiang Hu
Yougang Bian
Qingjia Cui
Xiaohui Qin
Rongjun Ding
author_facet Jun Ye
Hongbo Gao
Manjiang Hu
Yougang Bian
Qingjia Cui
Xiaohui Qin
Rongjun Ding
author_sort Jun Ye
collection DOAJ
description Abstract A mixed adaptive dynamic programming (ADP) scheme based on zero‐sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm. Initially, the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias. Also, the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset. To comprehensively leverage the advantages of model‐based and model‐free methods during training, an adaptive tuning factor is introduced based on the dataset that possesses model‐referenced information and conforms to the distribution of the real‐world environment, which balances the influence of model‐based control law and data‐driven policy gradient on the direction of policy improvement. As a result, the proposed approach accelerates the learning speed compared to data‐driven methods, concurrently also enhancing the tracking performance in comparison to model‐based control methods. Moreover, the optimal control problem under disturbances is formulated as a zero‐sum game, and the actor‐critic‐disturbance framework is introduced to approximate the optimal control input, cost function, and disturbance policy, respectively. Furthermore, the convergence property of the proposed algorithm based on the value iteration method is analysed. Finally, an example of AUV path following based on the improved line‐of‐sight guidance is presented to demonstrate the effectiveness of the proposed method.
format Article
id doaj-art-54c4d7eca72d4e3aa3d6ba7fe411efa6
institution OA Journals
issn 2468-2322
language English
publishDate 2025-04-01
publisher Wiley
record_format Article
series CAAI Transactions on Intelligence Technology
spelling doaj-art-54c4d7eca72d4e3aa3d6ba7fe411efa62025-08-20T02:18:39ZengWileyCAAI Transactions on Intelligence Technology2468-23222025-04-0110251052810.1049/cit2.12372Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejectionJun Ye0Hongbo Gao1Manjiang Hu2Yougang Bian3Qingjia Cui4Xiaohui Qin5Rongjun Ding6State Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaSchool of Information Science and Technology, and Institute of Advanced Technology University of Science and Technology of China Hefei ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaState Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle College of Mechanical and Vehicle Engineering Hunan University Changsha ChinaAbstract A mixed adaptive dynamic programming (ADP) scheme based on zero‐sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm. Initially, the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias. Also, the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset. To comprehensively leverage the advantages of model‐based and model‐free methods during training, an adaptive tuning factor is introduced based on the dataset that possesses model‐referenced information and conforms to the distribution of the real‐world environment, which balances the influence of model‐based control law and data‐driven policy gradient on the direction of policy improvement. As a result, the proposed approach accelerates the learning speed compared to data‐driven methods, concurrently also enhancing the tracking performance in comparison to model‐based control methods. Moreover, the optimal control problem under disturbances is formulated as a zero‐sum game, and the actor‐critic‐disturbance framework is introduced to approximate the optimal control input, cost function, and disturbance policy, respectively. Furthermore, the convergence property of the proposed algorithm based on the value iteration method is analysed. Finally, an example of AUV path following based on the improved line‐of‐sight guidance is presented to demonstrate the effectiveness of the proposed method.https://doi.org/10.1049/cit2.12372adaptive dynamic programmingautonomous underwater vehiclegame theoryoptimal controlreinforcement learning
spellingShingle Jun Ye
Hongbo Gao
Manjiang Hu
Yougang Bian
Qingjia Cui
Xiaohui Qin
Rongjun Ding
Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
CAAI Transactions on Intelligence Technology
adaptive dynamic programming
autonomous underwater vehicle
game theory
optimal control
reinforcement learning
title Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_full Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_fullStr Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_full_unstemmed Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_short Learning‐based tracking control of AUV: Mixed policy improvement and game‐based disturbance rejection
title_sort learning based tracking control of auv mixed policy improvement and game based disturbance rejection
topic adaptive dynamic programming
autonomous underwater vehicle
game theory
optimal control
reinforcement learning
url https://doi.org/10.1049/cit2.12372
work_keys_str_mv AT junye learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection
AT hongbogao learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection
AT manjianghu learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection
AT yougangbian learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection
AT qingjiacui learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection
AT xiaohuiqin learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection
AT rongjunding learningbasedtrackingcontrolofauvmixedpolicyimprovementandgamebaseddisturbancerejection