LazyAct: Lazy actor with dynamic state skip based on constrained MDP.

Deep reinforcement learning has achieved significant success in complex decision-making tasks. However, the high computational cost of policies based on deep neural networks restricts their practical application. Specifically, each decision made by an agent requires a complete neural network computa...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hongjie Zhang, Zhenyu Chen, Hourui Deng, Chaosheng Feng
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0318778
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823856797506600960
author	Hongjie Zhang Zhenyu Chen Hourui Deng Chaosheng Feng
author_facet	Hongjie Zhang Zhenyu Chen Hourui Deng Chaosheng Feng
author_sort	Hongjie Zhang
collection	DOAJ
description	Deep reinforcement learning has achieved significant success in complex decision-making tasks. However, the high computational cost of policies based on deep neural networks restricts their practical application. Specifically, each decision made by an agent requires a complete neural network computation, leading to a linear increase in computational cost with the number of interactions and agents. Inspired by human decision-making patterns, which involve reasoning only on critical states in continuous decision-making tasks without considering all states, we introduce the LazyAct algorithm. This algorithm significantly reduces the number of inferences while preserving the quality of the policy. Firstly, we incorporate a state skipping branch into the actor network to bypass states with minimal impact. Subsequently, we establish optimization objectives for single-agent and multi-agents inference, incorporating cost constraints based on the IMPALA and MAPPO frameworks, respectively. Finally, we utilize pre-training and fine-tuning techniques to train the policy network. Extensive experimental results indicate that LazyAct reduces the number of inferences by approximately 80% and 40% in single-agent and multi-agents scenarios, respectively, while sustaining comparable policy performance. The inferences reduction significantly decreases the time and FLOPs required by the LazyAct algorithm to complete tasks. Code is available here https://www.dropbox.com/scl/fo/wyoqo6q9gyt86zobfgbvx/h?\rlkey=0moyxsnoiisfs9y4h89hsou1l&dl=0.
format	Article
id	doaj-art-0a9e64275deb4572a4a820e8fe914cf3
institution	Kabale University
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-0a9e64275deb4572a4a820e8fe914cf32025-02-12T05:30:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01202e031877810.1371/journal.pone.0318778LazyAct: Lazy actor with dynamic state skip based on constrained MDP.Hongjie ZhangZhenyu ChenHourui DengChaosheng FengDeep reinforcement learning has achieved significant success in complex decision-making tasks. However, the high computational cost of policies based on deep neural networks restricts their practical application. Specifically, each decision made by an agent requires a complete neural network computation, leading to a linear increase in computational cost with the number of interactions and agents. Inspired by human decision-making patterns, which involve reasoning only on critical states in continuous decision-making tasks without considering all states, we introduce the LazyAct algorithm. This algorithm significantly reduces the number of inferences while preserving the quality of the policy. Firstly, we incorporate a state skipping branch into the actor network to bypass states with minimal impact. Subsequently, we establish optimization objectives for single-agent and multi-agents inference, incorporating cost constraints based on the IMPALA and MAPPO frameworks, respectively. Finally, we utilize pre-training and fine-tuning techniques to train the policy network. Extensive experimental results indicate that LazyAct reduces the number of inferences by approximately 80% and 40% in single-agent and multi-agents scenarios, respectively, while sustaining comparable policy performance. The inferences reduction significantly decreases the time and FLOPs required by the LazyAct algorithm to complete tasks. Code is available here https://www.dropbox.com/scl/fo/wyoqo6q9gyt86zobfgbvx/h?\rlkey=0moyxsnoiisfs9y4h89hsou1l&dl=0.https://doi.org/10.1371/journal.pone.0318778
spellingShingle	Hongjie Zhang Zhenyu Chen Hourui Deng Chaosheng Feng LazyAct: Lazy actor with dynamic state skip based on constrained MDP. PLoS ONE
title	LazyAct: Lazy actor with dynamic state skip based on constrained MDP.
title_full	LazyAct: Lazy actor with dynamic state skip based on constrained MDP.
title_fullStr	LazyAct: Lazy actor with dynamic state skip based on constrained MDP.
title_full_unstemmed	LazyAct: Lazy actor with dynamic state skip based on constrained MDP.
title_short	LazyAct: Lazy actor with dynamic state skip based on constrained MDP.
title_sort	lazyact lazy actor with dynamic state skip based on constrained mdp
url	https://doi.org/10.1371/journal.pone.0318778
work_keys_str_mv	AT hongjiezhang lazyactlazyactorwithdynamicstateskipbasedonconstrainedmdp AT zhenyuchen lazyactlazyactorwithdynamicstateskipbasedonconstrainedmdp AT houruideng lazyactlazyactorwithdynamicstateskipbasedonconstrainedmdp AT chaoshengfeng lazyactlazyactorwithdynamicstateskipbasedonconstrainedmdp

LazyAct: Lazy actor with dynamic state skip based on constrained MDP.

Similar Items