Exploration design for Q-learning-based adaptive linear quadratic optimal regulators under stochastic disturbances
This study considers a discrete-time, linear state feedback control strategy rooted in Q-learning, one of the Reinforcement Learning (RL) approaches, to address an adaptive Linear Quadratic (LQ) problem under stochastic disturbances. Q-learning optimizes the state-action policy by estimating the Q-f...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2025-12-01
|
| Series: | SICE Journal of Control, Measurement, and System Integration |
| Subjects: | |
| Online Access: | http://dx.doi.org/10.1080/18824889.2025.2470502 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This study considers a discrete-time, linear state feedback control strategy rooted in Q-learning, one of the Reinforcement Learning (RL) approaches, to address an adaptive Linear Quadratic (LQ) problem under stochastic disturbances. Q-learning optimizes the state-action policy by estimating the Q-function iteratively. This study proposes exploration signal design for the bias-free Q-learning algorithm that modifies the recursively defined Q-function by adding a disturbance-influenced term and utilizes an Instrumental Variable (IV) technique to resolve the bias-error issue. For the exploration signal design, this study introduces two methods, a decaying method and a halt method. The decaying method gradually reduces the exploration signal over time, enabling effective initial exploration while stabilizing as the system approaches optimal performance. The halt method further minimizes exploration by selectively maintaining signal levels only when necessary, helping the learning process converge without interfering with the progression of the estimation to true values. By balancing exploration with convergence stability, the approaches prevent instability and inefficiencies commonly associated with large and constant exploration signals in Q-learning for LQ control. Numerical simulations demonstrate that the proposed exploration signal design improves learning efficiency, supports consistent convergence, and enhances overall system stability. |
|---|---|
| ISSN: | 1884-9970 |