A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning
Abstract Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Am...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Complex & Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s40747-025-01884-y |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850207476843020288 |
|---|---|
| author | Ruoyu Sun Yue Xi Angelos Stefanidis Zhengyong Jiang Jionglong Su |
| author_facet | Ruoyu Sun Yue Xi Angelos Stefanidis Zhengyong Jiang Jionglong Su |
| author_sort | Ruoyu Sun |
| collection | DOAJ |
| description | Abstract Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments. |
| format | Article |
| id | doaj-art-ec6be2ea4b9d468aa9fb76a5290d2c14 |
| institution | OA Journals |
| issn | 2199-4536 2198-6053 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Springer |
| record_format | Article |
| series | Complex & Intelligent Systems |
| spelling | doaj-art-ec6be2ea4b9d468aa9fb76a5290d2c142025-08-20T02:10:31ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-05-0111714110.1007/s40747-025-01884-yA novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learningRuoyu Sun0Yue Xi1Angelos Stefanidis2Zhengyong Jiang3Jionglong Su4School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversityDepartment of Educational Studies, School of Academy of Future Education, Xi’an Jiaotong-Liverpool UniversitySchool of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversitySchool of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversitySchool of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool UniversityAbstract Deep reinforcement learning (DRL) has been extensively used to address portfolio optimization problems. DRL agents acquire knowledge and make decisions through unsupervised interactions with their environment without requiring explicit knowledge of the joint dynamics of portfolio assets. Among these DRL algorithms, the combination of actor-critic algorithms and deep function approximators is the most widely used DRL algorithm. Here, we find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is insignificant. We argue that such situations primarily arise from the following two problems: sparsity in positive reward and the curse of dimensionality. These limitations prevent DRL agents from comprehensively learning asset price change patterns in the training environment. As a result, the DRL agents cannot effectively explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process. To address these problems, we propose a novel multi-agent learning system based on the hierarchical deep reinforcement learning (HDRL) algorithmic framework in this research. Under this framework, the agents work together as a learning system for portfolio optimization. Specifically, by designing an auxiliary agent that works together with the executive agent for optimal policy exploration, the learning system can focus on exploring the policy with higher risk-adjusted return in the action space with positive return and low variance. The performance of the proposed learning system is evaluated using a portfolio of 29 stocks from the Dow Jones index in four different experiments. In the training process, the objective functions of the actor and critic both ultimately achieve stable convergence in the training process. The risk-adjusted profitability of our learning system in the training environment is significantly improved. Hence, we prove that the policies executed by our learning system in out-sample experiments originate from the DRL agents' comprehensive learning of asset price change patterns in the training environment. Furthermore, we find that adopting the auxiliary agent and HDRL training algorithm can efficiently overcome the issue of the curse of dimensionality and improve the training efficiency in the positive reward sparse environment. In each back-test experiment, the proposed learning system is compared to sixteen traditional strategies and ten strategies based on machine learning algorithms in the performance of profitability and risk control ability. The empirical results in the four evaluation experiments demonstrate the efficacy of our learning system, which outperforms all other strategies by at least 8.2% in terms of Sharpe ratio, Sorino ratio, and Calmar ratio. This indicates that the policies learned in the training environment can exhibit excellent generalization ability in the back-testing experiments.https://doi.org/10.1007/s40747-025-01884-yHierarchical deep reinforcement learningPortfolio optimizationLearning systemMulti-agent |
| spellingShingle | Ruoyu Sun Yue Xi Angelos Stefanidis Zhengyong Jiang Jionglong Su A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning Complex & Intelligent Systems Hierarchical deep reinforcement learning Portfolio optimization Learning system Multi-agent |
| title | A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning |
| title_full | A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning |
| title_fullStr | A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning |
| title_full_unstemmed | A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning |
| title_short | A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning |
| title_sort | novel multi agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning |
| topic | Hierarchical deep reinforcement learning Portfolio optimization Learning system Multi-agent |
| url | https://doi.org/10.1007/s40747-025-01884-y |
| work_keys_str_mv | AT ruoyusun anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT yuexi anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT angelosstefanidis anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT zhengyongjiang anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT jionglongsu anovelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT ruoyusun novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT yuexi novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT angelosstefanidis novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT zhengyongjiang novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning AT jionglongsu novelmultiagentdynamicportfoliooptimizationlearningsystembasedonhierarchicaldeepreinforcementlearning |