Optimization of fluid control laws through deep reinforcement learning using dynamic mode decomposition as the environment

The optimization of fluid control laws through deep reinforcement learning (DRL) presents a challenge owing to the considerable computational costs associated with trial-and-error processes. In this study, we examine the feasibility of deriving an effective control law using a reduced-order model co...

Full description

Saved in:
Bibliographic Details
Main Authors: T. Sakamoto, K. Okabayashi
Format: Article
Language:English
Published: AIP Publishing LLC 2024-11-01
Series:AIP Advances
Online Access:http://dx.doi.org/10.1063/5.0237682
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850137689157795840
author T. Sakamoto
K. Okabayashi
author_facet T. Sakamoto
K. Okabayashi
author_sort T. Sakamoto
collection DOAJ
description The optimization of fluid control laws through deep reinforcement learning (DRL) presents a challenge owing to the considerable computational costs associated with trial-and-error processes. In this study, we examine the feasibility of deriving an effective control law using a reduced-order model constructed by dynamic mode decomposition with control (DMDc). DMDc is a method of modal analysis of a flow field that incorporates external inputs, and we utilize it to represent the time development of flow in the DRL environment. We also examine the amount of computation time saved by this method. We adopt the optimization problem of the control law for managing lift fluctuations caused by the Kármán vortex shedding in the flow around a cylinder. The deep deterministic policy gradient is used as the DRL algorithm. The external input for the DMDc model consists of a superposition of the chirp signal, containing various amplitudes and frequencies, and random noise. This combination is used to express random actions during the exploration phase. With DRL in a DMDc environment, a control law that exceeds the performance of conventional mathematical control is derived, although the learning is unstable (not converged). This lack of convergence is also observed with DRL in a computational fluid dynamics (CFD) environment. However, when the number of learning epochs is the same, a superior control law is obtained with DRL in a DMDc environment. This outcome could be attributed to the DMDc representation of the flow field, which tends to smooth out high-frequency fluctuations even when subjected to signals of larger amplitude. In addition, using DMDc results in a computation time savings of up to a factor of 3 compared to using CFD.
format Article
id doaj-art-330fb9e9baf046e3b7cf2a020fbdba37
institution OA Journals
issn 2158-3226
language English
publishDate 2024-11-01
publisher AIP Publishing LLC
record_format Article
series AIP Advances
spelling doaj-art-330fb9e9baf046e3b7cf2a020fbdba372025-08-20T02:30:46ZengAIP Publishing LLCAIP Advances2158-32262024-11-011411115204115204-1710.1063/5.0237682Optimization of fluid control laws through deep reinforcement learning using dynamic mode decomposition as the environmentT. Sakamoto0K. Okabayashi1Department of Mechanical Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka, JapanDepartment of Mechanical Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka, JapanThe optimization of fluid control laws through deep reinforcement learning (DRL) presents a challenge owing to the considerable computational costs associated with trial-and-error processes. In this study, we examine the feasibility of deriving an effective control law using a reduced-order model constructed by dynamic mode decomposition with control (DMDc). DMDc is a method of modal analysis of a flow field that incorporates external inputs, and we utilize it to represent the time development of flow in the DRL environment. We also examine the amount of computation time saved by this method. We adopt the optimization problem of the control law for managing lift fluctuations caused by the Kármán vortex shedding in the flow around a cylinder. The deep deterministic policy gradient is used as the DRL algorithm. The external input for the DMDc model consists of a superposition of the chirp signal, containing various amplitudes and frequencies, and random noise. This combination is used to express random actions during the exploration phase. With DRL in a DMDc environment, a control law that exceeds the performance of conventional mathematical control is derived, although the learning is unstable (not converged). This lack of convergence is also observed with DRL in a computational fluid dynamics (CFD) environment. However, when the number of learning epochs is the same, a superior control law is obtained with DRL in a DMDc environment. This outcome could be attributed to the DMDc representation of the flow field, which tends to smooth out high-frequency fluctuations even when subjected to signals of larger amplitude. In addition, using DMDc results in a computation time savings of up to a factor of 3 compared to using CFD.http://dx.doi.org/10.1063/5.0237682
spellingShingle T. Sakamoto
K. Okabayashi
Optimization of fluid control laws through deep reinforcement learning using dynamic mode decomposition as the environment
AIP Advances
title Optimization of fluid control laws through deep reinforcement learning using dynamic mode decomposition as the environment
title_full Optimization of fluid control laws through deep reinforcement learning using dynamic mode decomposition as the environment
title_fullStr Optimization of fluid control laws through deep reinforcement learning using dynamic mode decomposition as the environment
title_full_unstemmed Optimization of fluid control laws through deep reinforcement learning using dynamic mode decomposition as the environment
title_short Optimization of fluid control laws through deep reinforcement learning using dynamic mode decomposition as the environment
title_sort optimization of fluid control laws through deep reinforcement learning using dynamic mode decomposition as the environment
url http://dx.doi.org/10.1063/5.0237682
work_keys_str_mv AT tsakamoto optimizationoffluidcontrollawsthroughdeepreinforcementlearningusingdynamicmodedecompositionastheenvironment
AT kokabayashi optimizationoffluidcontrollawsthroughdeepreinforcementlearningusingdynamicmodedecompositionastheenvironment