Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling
Representation learning plays a vital role in autonomous driving by extracting meaningful features from raw sensory inputs. World models emerge as an effective approach to representation learning by capturing predictive features that can anticipate multiple possible futures, which is particularly su...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Machines |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2075-1702/13/3/231 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850205483667816448 |
|---|---|
| author | Haoqiang Chen Yadong Liu Dewen Hu |
| author_facet | Haoqiang Chen Yadong Liu Dewen Hu |
| author_sort | Haoqiang Chen |
| collection | DOAJ |
| description | Representation learning plays a vital role in autonomous driving by extracting meaningful features from raw sensory inputs. World models emerge as an effective approach to representation learning by capturing predictive features that can anticipate multiple possible futures, which is particularly suited for driving scenarios. However, existing world model approaches face two critical limitations: First, conventional methods rely heavily on computationally expensive variational inference that requires decoding back to high-dimensional observation space. Second, current end-to-end autonomous driving systems demand extensive labeled data for training, resulting in prohibitive annotation costs. To address these challenges, we present BYOL-Drive, a novel method that firstly introduces the self-supervised representation-learning paradigm BYOL (Bootstrap Your Own Latent) to implement world modeling. Our method eliminates the computational burden of observation space decoding while requiring substantially fewer labeled data compared to mainstream approaches. Additionally, our model only relies on monocular camera images as input, making it easy to deploy and generalize. Based on this learned representation, experiments on the standard closed-loop CARLA benchmark demonstrate that our BYOL-Drive achieves competitive performance with improved computational efficiency and significantly reduced annotation requirements compared to the state-of-the-art methods. Our work contributes to the development of end-to-end autonomous driving. |
| format | Article |
| id | doaj-art-b682860f4e4445b4a7419f093f56f356 |
| institution | OA Journals |
| issn | 2075-1702 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Machines |
| spelling | doaj-art-b682860f4e4445b4a7419f093f56f3562025-08-20T02:11:04ZengMDPI AGMachines2075-17022025-03-0113323110.3390/machines13030231Representation Learning for Vision-Based Autonomous Driving via Probabilistic World ModelingHaoqiang Chen0Yadong Liu1Dewen Hu2College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaRepresentation learning plays a vital role in autonomous driving by extracting meaningful features from raw sensory inputs. World models emerge as an effective approach to representation learning by capturing predictive features that can anticipate multiple possible futures, which is particularly suited for driving scenarios. However, existing world model approaches face two critical limitations: First, conventional methods rely heavily on computationally expensive variational inference that requires decoding back to high-dimensional observation space. Second, current end-to-end autonomous driving systems demand extensive labeled data for training, resulting in prohibitive annotation costs. To address these challenges, we present BYOL-Drive, a novel method that firstly introduces the self-supervised representation-learning paradigm BYOL (Bootstrap Your Own Latent) to implement world modeling. Our method eliminates the computational burden of observation space decoding while requiring substantially fewer labeled data compared to mainstream approaches. Additionally, our model only relies on monocular camera images as input, making it easy to deploy and generalize. Based on this learned representation, experiments on the standard closed-loop CARLA benchmark demonstrate that our BYOL-Drive achieves competitive performance with improved computational efficiency and significantly reduced annotation requirements compared to the state-of-the-art methods. Our work contributes to the development of end-to-end autonomous driving.https://www.mdpi.com/2075-1702/13/3/231autonomous drivingrepresentation learningworld modelimitation learning |
| spellingShingle | Haoqiang Chen Yadong Liu Dewen Hu Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling Machines autonomous driving representation learning world model imitation learning |
| title | Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling |
| title_full | Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling |
| title_fullStr | Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling |
| title_full_unstemmed | Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling |
| title_short | Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling |
| title_sort | representation learning for vision based autonomous driving via probabilistic world modeling |
| topic | autonomous driving representation learning world model imitation learning |
| url | https://www.mdpi.com/2075-1702/13/3/231 |
| work_keys_str_mv | AT haoqiangchen representationlearningforvisionbasedautonomousdrivingviaprobabilisticworldmodeling AT yadongliu representationlearningforvisionbasedautonomousdrivingviaprobabilisticworldmodeling AT dewenhu representationlearningforvisionbasedautonomousdrivingviaprobabilisticworldmodeling |