A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data

A deep reinforcement learning framework is presented for strategy generation and profit forecasting based on large-scale economic behavior data. By integrating perturbation-based augmentation, backward return estimation, and policy-stabilization mechanisms, the framework facilitates robust modeling...

Full description

Saved in:
Bibliographic Details
Main Authors: Zizhe Zhou, Liman Zhang, Xuran Liu, Siyang He, Jingxuan Zhang, Jinzhi Zhu, Yuanping Pang, Chunli Lv
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/11/6215
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850129635731308544
author Zizhe Zhou
Liman Zhang
Xuran Liu
Siyang He
Jingxuan Zhang
Jinzhi Zhu
Yuanping Pang
Chunli Lv
author_facet Zizhe Zhou
Liman Zhang
Xuran Liu
Siyang He
Jingxuan Zhang
Jinzhi Zhu
Yuanping Pang
Chunli Lv
author_sort Zizhe Zhou
collection DOAJ
description A deep reinforcement learning framework is presented for strategy generation and profit forecasting based on large-scale economic behavior data. By integrating perturbation-based augmentation, backward return estimation, and policy-stabilization mechanisms, the framework facilitates robust modeling and optimization of complex, dynamic behavior sequences. Experimental evaluations on four distinct behavior data subsets indicate that the proposed method achieved consistent performance improvements over representative baseline models across key metrics, including total profit gain, average reward, policy stability, and profit–price correlation. On the sales feedback dataset, the framework achieved a total profit gain of 0.37, an average reward of 4.85, a low-action standard deviation of 0.37, and a correlation score of <inline-formula><math display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>=</mo><mn>0.91</mn></mrow></semantics></math></inline-formula>. In the overall benchmark comparison, the model attained a precision of 0.92 and a recall of 0.89, reflecting reliable strategy response and predictive consistency. These results suggest that the proposed method is capable of effectively handling decision-making scenarios involving sparse feedback, heterogeneous behavior, and temporal volatility, with demonstrable generalization potential and practical relevance.
format Article
id doaj-art-69b6dff8daad4ecbb21478ee1e284b2f
institution OA Journals
issn 2076-3417
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-69b6dff8daad4ecbb21478ee1e284b2f2025-08-20T02:32:54ZengMDPI AGApplied Sciences2076-34172025-05-011511621510.3390/app15116215A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse DataZizhe Zhou0Liman Zhang1Xuran Liu2Siyang He3Jingxuan Zhang4Jinzhi Zhu5Yuanping Pang6Chunli Lv7China Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaChina Agricultural University, Beijing 100083, ChinaA deep reinforcement learning framework is presented for strategy generation and profit forecasting based on large-scale economic behavior data. By integrating perturbation-based augmentation, backward return estimation, and policy-stabilization mechanisms, the framework facilitates robust modeling and optimization of complex, dynamic behavior sequences. Experimental evaluations on four distinct behavior data subsets indicate that the proposed method achieved consistent performance improvements over representative baseline models across key metrics, including total profit gain, average reward, policy stability, and profit–price correlation. On the sales feedback dataset, the framework achieved a total profit gain of 0.37, an average reward of 4.85, a low-action standard deviation of 0.37, and a correlation score of <inline-formula><math display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>=</mo><mn>0.91</mn></mrow></semantics></math></inline-formula>. In the overall benchmark comparison, the model attained a precision of 0.92 and a recall of 0.89, reflecting reliable strategy response and predictive consistency. These results suggest that the proposed method is capable of effectively handling decision-making scenarios involving sparse feedback, heterogeneous behavior, and temporal volatility, with demonstrable generalization potential and practical relevance.https://www.mdpi.com/2076-3417/15/11/6215deep reinforcement learningtransformer-based neural architecturesstability-constrained deep learningsequential decision-making networks
spellingShingle Zizhe Zhou
Liman Zhang
Xuran Liu
Siyang He
Jingxuan Zhang
Jinzhi Zhu
Yuanping Pang
Chunli Lv
A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data
Applied Sciences
deep reinforcement learning
transformer-based neural architectures
stability-constrained deep learning
sequential decision-making networks
title A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data
title_full A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data
title_fullStr A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data
title_full_unstemmed A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data
title_short A Transformer-Based Reinforcement Learning Framework for Sequential Strategy Optimization in Sparse Data
title_sort transformer based reinforcement learning framework for sequential strategy optimization in sparse data
topic deep reinforcement learning
transformer-based neural architectures
stability-constrained deep learning
sequential decision-making networks
url https://www.mdpi.com/2076-3417/15/11/6215
work_keys_str_mv AT zizhezhou atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT limanzhang atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT xuranliu atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT siyanghe atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT jingxuanzhang atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT jinzhizhu atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT yuanpingpang atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT chunlilv atransformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT zizhezhou transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT limanzhang transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT xuranliu transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT siyanghe transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT jingxuanzhang transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT jinzhizhu transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT yuanpingpang transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata
AT chunlilv transformerbasedreinforcementlearningframeworkforsequentialstrategyoptimizationinsparsedata