Exploration design for Q-learning-based adaptive linear quadratic optimal regulators under stochastic disturbances

This study considers a discrete-time, linear state feedback control strategy rooted in Q-learning, one of the Reinforcement Learning (RL) approaches, to address an adaptive Linear Quadratic (LQ) problem under stochastic disturbances. Q-learning optimizes the state-action policy by estimating the Q-f...

Full description

Saved in:
Bibliographic Details
Main Authors: Vina Putri Virgiani, Shiro Masuda
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:SICE Journal of Control, Measurement, and System Integration
Subjects:
Online Access:http://dx.doi.org/10.1080/18824889.2025.2470502
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850273388092719104
author Vina Putri Virgiani
Shiro Masuda
author_facet Vina Putri Virgiani
Shiro Masuda
author_sort Vina Putri Virgiani
collection DOAJ
description This study considers a discrete-time, linear state feedback control strategy rooted in Q-learning, one of the Reinforcement Learning (RL) approaches, to address an adaptive Linear Quadratic (LQ) problem under stochastic disturbances. Q-learning optimizes the state-action policy by estimating the Q-function iteratively. This study proposes exploration signal design for the bias-free Q-learning algorithm that modifies the recursively defined Q-function by adding a disturbance-influenced term and utilizes an Instrumental Variable (IV) technique to resolve the bias-error issue. For the exploration signal design, this study introduces two methods, a decaying method and a halt method. The decaying method gradually reduces the exploration signal over time, enabling effective initial exploration while stabilizing as the system approaches optimal performance. The halt method further minimizes exploration by selectively maintaining signal levels only when necessary, helping the learning process converge without interfering with the progression of the estimation to true values. By balancing exploration with convergence stability, the approaches prevent instability and inefficiencies commonly associated with large and constant exploration signals in Q-learning for LQ control. Numerical simulations demonstrate that the proposed exploration signal design improves learning efficiency, supports consistent convergence, and enhances overall system stability.
format Article
id doaj-art-9fe17d928d144acebdc78d008b0d5a57
institution OA Journals
issn 1884-9970
language English
publishDate 2025-12-01
publisher Taylor & Francis Group
record_format Article
series SICE Journal of Control, Measurement, and System Integration
spelling doaj-art-9fe17d928d144acebdc78d008b0d5a572025-08-20T01:51:30ZengTaylor & Francis GroupSICE Journal of Control, Measurement, and System Integration1884-99702025-12-0118110.1080/18824889.2025.24705022470502Exploration design for Q-learning-based adaptive linear quadratic optimal regulators under stochastic disturbancesVina Putri Virgiani0Shiro Masuda1Tokyo Metropolitan UniversityTokyo Metropolitan UniversityThis study considers a discrete-time, linear state feedback control strategy rooted in Q-learning, one of the Reinforcement Learning (RL) approaches, to address an adaptive Linear Quadratic (LQ) problem under stochastic disturbances. Q-learning optimizes the state-action policy by estimating the Q-function iteratively. This study proposes exploration signal design for the bias-free Q-learning algorithm that modifies the recursively defined Q-function by adding a disturbance-influenced term and utilizes an Instrumental Variable (IV) technique to resolve the bias-error issue. For the exploration signal design, this study introduces two methods, a decaying method and a halt method. The decaying method gradually reduces the exploration signal over time, enabling effective initial exploration while stabilizing as the system approaches optimal performance. The halt method further minimizes exploration by selectively maintaining signal levels only when necessary, helping the learning process converge without interfering with the progression of the estimation to true values. By balancing exploration with convergence stability, the approaches prevent instability and inefficiencies commonly associated with large and constant exploration signals in Q-learning for LQ control. Numerical simulations demonstrate that the proposed exploration signal design improves learning efficiency, supports consistent convergence, and enhances overall system stability.http://dx.doi.org/10.1080/18824889.2025.2470502adaptive controllinear quadratic optimal controlreinforcement learningq-learningexploration signalstochastic disturbancesinstrumental variable
spellingShingle Vina Putri Virgiani
Shiro Masuda
Exploration design for Q-learning-based adaptive linear quadratic optimal regulators under stochastic disturbances
SICE Journal of Control, Measurement, and System Integration
adaptive control
linear quadratic optimal control
reinforcement learning
q-learning
exploration signal
stochastic disturbances
instrumental variable
title Exploration design for Q-learning-based adaptive linear quadratic optimal regulators under stochastic disturbances
title_full Exploration design for Q-learning-based adaptive linear quadratic optimal regulators under stochastic disturbances
title_fullStr Exploration design for Q-learning-based adaptive linear quadratic optimal regulators under stochastic disturbances
title_full_unstemmed Exploration design for Q-learning-based adaptive linear quadratic optimal regulators under stochastic disturbances
title_short Exploration design for Q-learning-based adaptive linear quadratic optimal regulators under stochastic disturbances
title_sort exploration design for q learning based adaptive linear quadratic optimal regulators under stochastic disturbances
topic adaptive control
linear quadratic optimal control
reinforcement learning
q-learning
exploration signal
stochastic disturbances
instrumental variable
url http://dx.doi.org/10.1080/18824889.2025.2470502
work_keys_str_mv AT vinaputrivirgiani explorationdesignforqlearningbasedadaptivelinearquadraticoptimalregulatorsunderstochasticdisturbances
AT shiromasuda explorationdesignforqlearningbasedadaptivelinearquadraticoptimalregulatorsunderstochasticdisturbances