Multi-Task Trajectory Prediction Using a Vehicle-Lane Disentangled Conditional Variational Autoencoder

Trajectory prediction under multimodal information is critical for autonomous driving, necessitating the integration of dynamic vehicle states and static high-definition (HD) maps to model complex agent–scene interactions effectively. However, existing methods often employ static scene encodings and...

Full description

Saved in:
Bibliographic Details
Main Authors: Haoyang Chen, Na Li, Hangguan Shan, Eryun Liu, Zhiyu Xiang
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/14/4505
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Trajectory prediction under multimodal information is critical for autonomous driving, necessitating the integration of dynamic vehicle states and static high-definition (HD) maps to model complex agent–scene interactions effectively. However, existing methods often employ static scene encodings and unstructured latent spaces, limiting their ability to capture evolving spatial contexts and produce diverse yet contextually coherent predictions. To tackle these challenges, we propose <i>MS-SLV</i>, a novel generative framework that introduces (1) a time-aware scene encoder that aligns HD map features with vehicle motion to capture evolving scene semantics and (2) a structured latent model that explicitly disentangles agent-specific intent and scene-level constraints. Additionally, we introduce an auxiliary lane prediction task to provide targeted supervision for scene understanding and improve latent variable learning. Our approach jointly predicts future trajectories and lane sequences, enabling more interpretable and scene-consistent forecasts. Extensive evaluations on the nuScenes dataset demonstrate the effectiveness of MS-SLV, achieving a 12.37% reduction in average displacement error and a 7.67% reduction in final displacement error over state-of-the-art methods. Moreover, MS-SLV significantly improves multi-modal prediction, reducing the top-5 Miss Rate (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>MR</mi><mn>5</mn></msub></semantics></math></inline-formula>) and top-10 Miss Rate (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>MR</mi><mn>10</mn></msub></semantics></math></inline-formula>) by 26% and 33%, respectively, and lowering the Off-Road Rate (ORR) by 3%, as compared with the strongest baseline in our evaluation.
ISSN:1424-8220