A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning

Abstract Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Am...

Full description

Saved in:
Bibliographic Details
Main Authors: Ruoyu Sun, Yue Xi, Angelos Stefanidis, Zhengyong Jiang, Jionglong Su
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-025-01884-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850207476843020288
author Ruoyu Sun
Yue Xi
Angelos Stefanidis
Zhengyong Jiang
Jionglong Su
author_facet Ruoyu Sun
Yue Xi
Angelos Stefanidis
Zhengyong Jiang
Jionglong Su
author_sort Ruoyu Sun
collection DOAJ
description Abstract Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments.
format Article
id doaj-art-ec6be2ea4b9d468aa9fb76a5290d2c14
institution OA Journals
issn 2199-4536
2198-6053
language English
publishDate 2025-05-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-ec6be2ea4b9d468aa9fb76a5290d2c142025-08-20T02:10:31ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-05-0111714110.1007/s40747-025-01884-yA novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learningRuoyu Sun0Yue Xi1Angelos Stefanidis2Zhengyong Jiang3Jionglong Su4School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversityDepartment of Educational Studies, School of Academy of Future Education, Xi’an Jiaotong-Liverpool UniversitySchool of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversitySchool of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversitySchool of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversityAbstract Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments.https://doi.org/10.1007/s40747-025-01884-yHierarchical deep reinforcement learningPortfolio optimizationLearning systemMulti-agent
spellingShingle Ruoyu Sun
Yue Xi
Angelos Stefanidis
Zhengyong Jiang
Jionglong Su
A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
Complex & Intelligent Systems
Hierarchical deep reinforcement learning
Portfolio optimization
Learning system
Multi-agent
title A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_full A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_fullStr A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_full_unstemmed A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_short A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
title_sort novel multi agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
topic Hierarchical deep reinforcement learning
Portfolio optimization
Learning system
Multi-agent
url https://doi.org/10.1007/s40747-025-01884-y
work_keys_str_mv AT ruoyusun anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning
AT yuexi anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning
AT angelosstefanidis anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning
AT zhengyongjiang anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning
AT jionglongsu anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning
AT ruoyusun novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning
AT yuexi novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning
AT angelosstefanidis novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning
AT zhengyongjiang novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning
AT jionglongsu novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning