A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning

Abstract Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Am...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ruoyu Sun, Yue Xi, Angelos Stefanidis, Zhengyong Jiang, Jionglong Su
Format:	Article
Language:	English
Published:	Springer 2025-05-01
Series:	Complex & Intelligent Systems
Subjects:	Hierarchical deep reinforcement learning Portfolio optimization Learning system Multi-agent
Online Access:	https://doi.org/10.1007/s40747-025-01884-y
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850207476843020288
author	Ruoyu Sun Yue Xi Angelos Stefanidis Zhengyong Jiang Jionglong Su
author_facet	Ruoyu Sun Yue Xi Angelos Stefanidis Zhengyong Jiang Jionglong Su
author_sort	Ruoyu Sun
collection	DOAJ
description	Abstract Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments.
format	Article
id	doaj-art-ec6be2ea4b9d468aa9fb76a5290d2c14
institution	OA Journals
issn	2199-4536 2198-6053
language	English
publishDate	2025-05-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj-art-ec6be2ea4b9d468aa9fb76a5290d2c142025-08-20T02:10:31ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-05-0111714110.1007/s40747-025-01884-yA novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learningRuoyu Sun0Yue Xi1Angelos Stefanidis2Zhengyong Jiang3Jionglong Su4School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversityDepartment of Educational Studies, School of Academy of Future Education, Xi’an Jiaotong-Liverpool UniversitySchool of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversitySchool of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversitySchool of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversityAbstract Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments.https://doi.org/10.1007/s40747-025-01884-yHierarchical deep reinforcement learningPortfolio optimizationLearning systemMulti-agent
spellingShingle	Ruoyu Sun Yue Xi Angelos Stefanidis Zhengyong Jiang Jionglong Su A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning Complex & Intelligent Systems Hierarchical deep reinforcement learning Portfolio optimization Learning system Multi-agent
title	A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_full	A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_fullStr	A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_full_unstemmed	A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_short	A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_sort	novel multi agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
topic	Hierarchical deep reinforcement learning Portfolio optimization Learning system Multi-agent
url	https://doi.org/10.1007/s40747-025-01884-y
work_keys_str_mv	AT ruoyusun anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT yuexi anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT angelosstefanidis anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT zhengyongjiang anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT jionglongsu anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT ruoyusun novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT yuexi novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT angelosstefanidis novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT zhengyongjiang novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT jionglongsu novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning

A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning

Similar Items