Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling

Representation learning plays a vital role in autonomous driving by extracting meaningful features from raw sensory inputs. World models emerge as an effective approach to representation learning by capturing predictive features that can anticipate multiple possible futures, which is particularly su...

Full description

Saved in:
Bibliographic Details
Main Authors: Haoqiang Chen, Yadong Liu, Dewen Hu
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Machines
Subjects:
Online Access:https://www.mdpi.com/2075-1702/13/3/231
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850205483667816448
author Haoqiang Chen
Yadong Liu
Dewen Hu
author_facet Haoqiang Chen
Yadong Liu
Dewen Hu
author_sort Haoqiang Chen
collection DOAJ
description Representation learning plays a vital role in autonomous driving by extracting meaningful features from raw sensory inputs. World models emerge as an effective approach to representation learning by capturing predictive features that can anticipate multiple possible futures, which is particularly suited for driving scenarios. However, existing world model approaches face two critical limitations: First, conventional methods rely heavily on computationally expensive variational inference that requires decoding back to high-dimensional observation space. Second, current end-to-end autonomous driving systems demand extensive labeled data for training, resulting in prohibitive annotation costs. To address these challenges, we present BYOL-Drive, a novel method that firstly introduces the self-supervised representation-learning paradigm BYOL (Bootstrap Your Own Latent) to implement world modeling. Our method eliminates the computational burden of observation space decoding while requiring substantially fewer labeled data compared to mainstream approaches. Additionally, our model only relies on monocular camera images as input, making it easy to deploy and generalize. Based on this learned representation, experiments on the standard closed-loop CARLA benchmark demonstrate that our BYOL-Drive achieves competitive performance with improved computational efficiency and significantly reduced annotation requirements compared to the state-of-the-art methods. Our work contributes to the development of end-to-end autonomous driving.
format Article
id doaj-art-b682860f4e4445b4a7419f093f56f356
institution OA Journals
issn 2075-1702
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Machines
spelling doaj-art-b682860f4e4445b4a7419f093f56f3562025-08-20T02:11:04ZengMDPI AGMachines2075-17022025-03-0113323110.3390/machines13030231Representation Learning for Vision-Based Autonomous Driving via Probabilistic World ModelingHaoqiang Chen0Yadong Liu1Dewen Hu2College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaRepresentation learning plays a vital role in autonomous driving by extracting meaningful features from raw sensory inputs. World models emerge as an effective approach to representation learning by capturing predictive features that can anticipate multiple possible futures, which is particularly suited for driving scenarios. However, existing world model approaches face two critical limitations: First, conventional methods rely heavily on computationally expensive variational inference that requires decoding back to high-dimensional observation space. Second, current end-to-end autonomous driving systems demand extensive labeled data for training, resulting in prohibitive annotation costs. To address these challenges, we present BYOL-Drive, a novel method that firstly introduces the self-supervised representation-learning paradigm BYOL (Bootstrap Your Own Latent) to implement world modeling. Our method eliminates the computational burden of observation space decoding while requiring substantially fewer labeled data compared to mainstream approaches. Additionally, our model only relies on monocular camera images as input, making it easy to deploy and generalize. Based on this learned representation, experiments on the standard closed-loop CARLA benchmark demonstrate that our BYOL-Drive achieves competitive performance with improved computational efficiency and significantly reduced annotation requirements compared to the state-of-the-art methods. Our work contributes to the development of end-to-end autonomous driving.https://www.mdpi.com/2075-1702/13/3/231autonomous drivingrepresentation learningworld modelimitation learning
spellingShingle Haoqiang Chen
Yadong Liu
Dewen Hu
Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling
Machines
autonomous driving
representation learning
world model
imitation learning
title Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling
title_full Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling
title_fullStr Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling
title_full_unstemmed Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling
title_short Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling
title_sort representation learning for vision based autonomous driving via probabilistic world modeling
topic autonomous driving
representation learning
world model
imitation learning
url https://www.mdpi.com/2075-1702/13/3/231
work_keys_str_mv AT haoqiangchen representationlearningforvisionbasedautonomousdrivingviaprobabilisticworldmodeling
AT yadongliu representationlearningforvisionbasedautonomousdrivingviaprobabilisticworldmodeling
AT dewenhu representationlearningforvisionbasedautonomousdrivingviaprobabilisticworldmodeling