Offline reinforcement learning combining generalized advantage estimation and modality decomposition interaction

Abstract Transformers show great potential in offline reinforcement learning via trajectory sequence modeling for action prediction. However, existing Transformer-based methods face limitations, such as ineffective trajectory stitching and the neglect of deep interactions within and between multimod...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kaixin Jin, Lifang Wang, Xiwen Wang, Wei Guo, Qiang Han, Xiaoqing Yu
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-05-01
Series:	Scientific Reports
Subjects:	Offline reinforcement learning Transformer Convformer Generalized advantage estimation Modality decomposition interaction
Online Access:	https://doi.org/10.1038/s41598-025-98572-1
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Transformers show great potential in offline reinforcement learning via trajectory sequence modeling for action prediction. However, existing Transformer-based methods face limitations, such as ineffective trajectory stitching and the neglect of deep interactions within and between multimodal information in trajectories. We propose CGM, an offline reinforcement learning approach that combines Generalized Advantage Estimation with Modality Decomposition Interaction (MDI) to address these challenges. Generalized Advantage Estimation relabels the dataset to enhance trajectory stitching effectiveness. MDI consists of an encoder and a decoder. The encoder integrates an intra-modal interaction mechanism based on ConvFormer and an inter-modal interaction mechanism based on a dual-Transformer architecture to enable information exchange within and across modalities. In intra-modal interaction, the convolutional properties of ConvFormer effectively capture the associative information within respective modalities of states and actions. In inter-modal interaction, the dual-Transformer architecture facilitates multimodal information exchange for states and actions separately, fully exploring potential correlations between different modal data to achieve deep cross-modal information interaction. The decoder utilizes advantage values to optimize action prediction. We compared CGM with state-of-the-art baseline methods on the D4RL dataset. On the MuJoCo dataset, our proposed method outperforms the optimal comparison method by 2.89% in performance.
ISSN:	2045-2322

Offline reinforcement learning combining generalized advantage estimation and modality decomposition interaction

Similar Items