A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data
A deep reinforcement learning framework is presented for strategy generation and profit forecasting based on large-scale economic behavior data. By integrating perturbation-based augmentation, backward return estimation, and policy-stabilization mechanisms, the framework facilitates robust modeling...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/11/6215 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850129635731308544 |
|---|---|
| author | Zizhe Zhou Liman Zhang Xuran Liu Siyang He Jingxuan Zhang Jinzhi Zhu Yuanping Pang Chunli Lv |
| author_facet | Zizhe Zhou Liman Zhang Xuran Liu Siyang He Jingxuan Zhang Jinzhi Zhu Yuanping Pang Chunli Lv |
| author_sort | Zizhe Zhou |
| collection | DOAJ |
| description | A deep reinforcement learning framework is presented for strategy generation and profit forecasting based on large-scale economic behavior data. By integrating perturbation-based augmentation, backward return estimation, and policy-stabilization mechanisms, the framework facilitates robust modeling and optimization of complex, dynamic behavior sequences. Experimental evaluations on four distinct behavior data subsets indicate that the proposed method achieved consistent performance improvements over representative baseline models across key metrics, including total profit gain, average reward, policy stability, and profit–price correlation. On the sales feedback dataset, the framework achieved a total profit gain of 0.37, an average reward of 4.85, a low-action standard deviation of 0.37, and a correlation score of <inline-formula><math display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>=</mo><mn>0.91</mn></mrow></semantics></math></inline-formula>. In the overall benchmark comparison, the model attained a precision of 0.92 and a recall of 0.89, reflecting reliable strategy response and predictive consistency. These results suggest that the proposed method is capable of effectively handling decision-making scenarios involving sparse feedback, heterogeneous behavior, and temporal volatility, with demonstrable generalization potential and practical relevance. |
| format | Article |
| id | doaj-art-69b6dff8daad4ecbb21478ee1e284b2f |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-69b6dff8daad4ecbb21478ee1e284b2f2025-08-20T02:32:54ZengMDPI AGApplied Sciences2076-34172025-05-011511621510.3390/app15116215A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse DataZizhe Zhou0Liman Zhang1Xuran Liu2Siyang He3Jingxuan Zhang4Jinzhi Zhu5Yuanping Pang6Chunli Lv7China Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaA deep reinforcement learning framework is presented for strategy generation and profit forecasting based on large-scale economic behavior data. By integrating perturbation-based augmentation, backward return estimation, and policy-stabilization mechanisms, the framework facilitates robust modeling and optimization of complex, dynamic behavior sequences. Experimental evaluations on four distinct behavior data subsets indicate that the proposed method achieved consistent performance improvements over representative baseline models across key metrics, including total profit gain, average reward, policy stability, and profit–price correlation. On the sales feedback dataset, the framework achieved a total profit gain of 0.37, an average reward of 4.85, a low-action standard deviation of 0.37, and a correlation score of <inline-formula><math display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>=</mo><mn>0.91</mn></mrow></semantics></math></inline-formula>. In the overall benchmark comparison, the model attained a precision of 0.92 and a recall of 0.89, reflecting reliable strategy response and predictive consistency. These results suggest that the proposed method is capable of effectively handling decision-making scenarios involving sparse feedback, heterogeneous behavior, and temporal volatility, with demonstrable generalization potential and practical relevance.https://www.mdpi.com/2076-3417/15/11/6215deep reinforcement learningtransformer-based neural architecturesstability-constrained deep learningsequential decision-making networks |
| spellingShingle | Zizhe Zhou Liman Zhang Xuran Liu Siyang He Jingxuan Zhang Jinzhi Zhu Yuanping Pang Chunli Lv A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data Applied Sciences deep reinforcement learning transformer-based neural architectures stability-constrained deep learning sequential decision-making networks |
| title | A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data |
| title_full | A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data |
| title_fullStr | A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data |
| title_full_unstemmed | A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data |
| title_short | A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data |
| title_sort | transformer based reinforcement learning framework for sequential strategy optimization in sparse data |
| topic | deep reinforcement learning transformer-based neural architectures stability-constrained deep learning sequential decision-making networks |
| url | https://www.mdpi.com/2076-3417/15/11/6215 |
| work_keys_str_mv | AT zizhezhou atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT limanzhang atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT xuranliu atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT siyanghe atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT jingxuanzhang atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT jinzhizhu atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT yuanpingpang atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT chunlilv atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT zizhezhou transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT limanzhang transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT xuranliu transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT siyanghe transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT jingxuanzhang transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT jinzhizhu transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT yuanpingpang transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata AT chunlilv transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata |