Enhanced Reward Function Design for Source Term Estimation Based on Deep Reinforcement Learning

This study investigates the design of reward functions for deep reinforcement learning-based source term estimation (STE). Estimating the properties of unknown hazardous gas leakage using a mobile sensor, known as STE problems, is challenging due to environmental turbulence and sensor noise. To addr...

Full description

Saved in:
Bibliographic Details
Main Authors: Junhee Lee, Hongro Jang, Minkyu Park, Hyondong Oh
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11004010/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850149597204185088
author Junhee Lee
Hongro Jang
Minkyu Park
Hyondong Oh
author_facet Junhee Lee
Hongro Jang
Minkyu Park
Hyondong Oh
author_sort Junhee Lee
collection DOAJ
description This study investigates the design of reward functions for deep reinforcement learning-based source term estimation (STE). Estimating the properties of unknown hazardous gas leakage using a mobile sensor, known as STE problems, is challenging due to environmental turbulence and sensor noise. To address this issue, the particle filter is employed to estimate the source term under noisy sensor measurements, and the deep Q-network is used to find the optimal source search policy. In deep reinforcement learning, selecting an appropriate reward function is crucial as it directly impacts the learning performance. Specifically, this paper first reviews existing reward functions based on penalty, distance, concentration, and entropy metrics. To overcome the limitations of existing rewards, we combine their strengths and propose new reward functions such as the Gaussian mixture model (GMM) variance-based reward and the GMM information gain-based reward. To validate the robustness of the proposed approach, simulations are conducted in two types of environments: basic and turbulent, by adjusting the parameters of the noise condition. The simulation results demonstrate that the proposed reward functions outperform existing ones and are particularly robust in noisy environments.
format Article
id doaj-art-400def8a55bc4de2ba718d67cc472bde
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-400def8a55bc4de2ba718d67cc472bde2025-08-20T02:26:51ZengIEEEIEEE Access2169-35362025-01-0113877778779210.1109/ACCESS.2025.356982711004010Enhanced Reward Function Design for Source Term Estimation Based on Deep Reinforcement LearningJunhee Lee0https://orcid.org/0009-0009-8680-0298Hongro Jang1https://orcid.org/0000-0002-3919-1348Minkyu Park2https://orcid.org/0000-0002-6148-8244Hyondong Oh3https://orcid.org/0000-0002-1051-9477Department of Mechanical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulju-gun, Ulsan, South KoreaDepartment of Mechanical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulju-gun, Ulsan, South KoreaDepartment of Mechanical Engineering, Changwon National University, Uichang-gu, Changwon, Republic of KoreaDepartment of Mechanical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulju-gun, Ulsan, South KoreaThis study investigates the design of reward functions for deep reinforcement learning-based source term estimation (STE). Estimating the properties of unknown hazardous gas leakage using a mobile sensor, known as STE problems, is challenging due to environmental turbulence and sensor noise. To address this issue, the particle filter is employed to estimate the source term under noisy sensor measurements, and the deep Q-network is used to find the optimal source search policy. In deep reinforcement learning, selecting an appropriate reward function is crucial as it directly impacts the learning performance. Specifically, this paper first reviews existing reward functions based on penalty, distance, concentration, and entropy metrics. To overcome the limitations of existing rewards, we combine their strengths and propose new reward functions such as the Gaussian mixture model (GMM) variance-based reward and the GMM information gain-based reward. To validate the robustness of the proposed approach, simulations are conducted in two types of environments: basic and turbulent, by adjusting the parameters of the noise condition. The simulation results demonstrate that the proposed reward functions outperform existing ones and are particularly robust in noisy environments.https://ieeexplore.ieee.org/document/11004010/Source term estimationdeep reinforcement learningdeep Q-networkreward functionBayesian inferenceparticle filter
spellingShingle Junhee Lee
Hongro Jang
Minkyu Park
Hyondong Oh
Enhanced Reward Function Design for Source Term Estimation Based on Deep Reinforcement Learning
IEEE Access
Source term estimation
deep reinforcement learning
deep Q-network
reward function
Bayesian inference
particle filter
title Enhanced Reward Function Design for Source Term Estimation Based on Deep Reinforcement Learning
title_full Enhanced Reward Function Design for Source Term Estimation Based on Deep Reinforcement Learning
title_fullStr Enhanced Reward Function Design for Source Term Estimation Based on Deep Reinforcement Learning
title_full_unstemmed Enhanced Reward Function Design for Source Term Estimation Based on Deep Reinforcement Learning
title_short Enhanced Reward Function Design for Source Term Estimation Based on Deep Reinforcement Learning
title_sort enhanced reward function design for source term estimation based on deep reinforcement learning
topic Source term estimation
deep reinforcement learning
deep Q-network
reward function
Bayesian inference
particle filter
url https://ieeexplore.ieee.org/document/11004010/
work_keys_str_mv AT junheelee enhancedrewardfunctiondesignforsourcetermestimationbasedondeepreinforcementlearning
AT hongrojang enhancedrewardfunctiondesignforsourcetermestimationbasedondeepreinforcementlearning
AT minkyupark enhancedrewardfunctiondesignforsourcetermestimationbasedondeepreinforcementlearning
AT hyondongoh enhancedrewardfunctiondesignforsourcetermestimationbasedondeepreinforcementlearning