A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentation
Abstract Bird's‐Eye‐View (BEV) map is a powerful and detailed scene representation for intelligent vehicles that provides both the location and semantic information about nearby objects from a top‐down perspective. BEV map generation is a complex multi‐stage task, and the existing methods typic...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2024-12-01
|
| Series: | IET Intelligent Transport Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1049/itr2.12367 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850174915347480576 |
|---|---|
| author | Zhongyu Rao Yingfeng Cai Hai Wang Long Chen Yicheng Li |
| author_facet | Zhongyu Rao Yingfeng Cai Hai Wang Long Chen Yicheng Li |
| author_sort | Zhongyu Rao |
| collection | DOAJ |
| description | Abstract Bird's‐Eye‐View (BEV) map is a powerful and detailed scene representation for intelligent vehicles that provides both the location and semantic information about nearby objects from a top‐down perspective. BEV map generation is a complex multi‐stage task, and the existing methods typically perform poorly for distant scenes. Thus, the authors introduce a novel multi‐stage model that infers to obtain more accurate BEV map. First, the authors propose the Adaptive Aggregation with Stereo Mixture Density (AA‐SMD) model, which is an improved stereo matching model that eliminates bleeding artefacts and provides more accurate depth estimation. Next, the authors employ the RGB‐Depth (RGB‐D) semantic segmentation model to improve the semantic segmentation performance and connectivity of their model. The depth map and semantic segmentation maps are then combined to create an incomplete BEV map. Finally, the authors propose a Multi Strip Pooling Unet (MSP‐Unet) model with a hierarchical multi‐scale (HMS) attention and strip pooling (SP) module to improve prediction with BEV generation. The authors evaluate their model with a Car Learn to Act (CARLA)‐generated synthetic dataset. The experiment results demonstrate that the authors’ model generates a highly accurate representation of the surrounding environment achieving a state‐of‐the‐art result of 61.50% Mean Intersection‐over‐Union (MIoU) across eight classes. |
| format | Article |
| id | doaj-art-31a2922ea7f1497c8ecdf97478aa953d |
| institution | OA Journals |
| issn | 1751-956X 1751-9578 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Wiley |
| record_format | Article |
| series | IET Intelligent Transport Systems |
| spelling | doaj-art-31a2922ea7f1497c8ecdf97478aa953d2025-08-20T02:19:34ZengWileyIET Intelligent Transport Systems1751-956X1751-95782024-12-0118122552256410.1049/itr2.12367A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentationZhongyu Rao0Yingfeng Cai1Hai Wang2Long Chen3Yicheng Li4Automotive Engineering Research Institute of Jiangsu University Zhenjiang People's Republic of ChinaAutomotive Engineering Research Institute of Jiangsu University Zhenjiang People's Republic of ChinaSchool of Automotive and Traffic Engineering of Jiangsu University Zhenjiang People's Republic of ChinaAutomotive Engineering Research Institute of Jiangsu University Zhenjiang People's Republic of ChinaAutomotive Engineering Research Institute of Jiangsu University Zhenjiang People's Republic of ChinaAbstract Bird's‐Eye‐View (BEV) map is a powerful and detailed scene representation for intelligent vehicles that provides both the location and semantic information about nearby objects from a top‐down perspective. BEV map generation is a complex multi‐stage task, and the existing methods typically perform poorly for distant scenes. Thus, the authors introduce a novel multi‐stage model that infers to obtain more accurate BEV map. First, the authors propose the Adaptive Aggregation with Stereo Mixture Density (AA‐SMD) model, which is an improved stereo matching model that eliminates bleeding artefacts and provides more accurate depth estimation. Next, the authors employ the RGB‐Depth (RGB‐D) semantic segmentation model to improve the semantic segmentation performance and connectivity of their model. The depth map and semantic segmentation maps are then combined to create an incomplete BEV map. Finally, the authors propose a Multi Strip Pooling Unet (MSP‐Unet) model with a hierarchical multi‐scale (HMS) attention and strip pooling (SP) module to improve prediction with BEV generation. The authors evaluate their model with a Car Learn to Act (CARLA)‐generated synthetic dataset. The experiment results demonstrate that the authors’ model generates a highly accurate representation of the surrounding environment achieving a state‐of‐the‐art result of 61.50% Mean Intersection‐over‐Union (MIoU) across eight classes.https://doi.org/10.1049/itr2.12367autonomous drivingimage recognitionlearning (artificial intelligence) |
| spellingShingle | Zhongyu Rao Yingfeng Cai Hai Wang Long Chen Yicheng Li A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentation IET Intelligent Transport Systems autonomous driving image recognition learning (artificial intelligence) |
| title | A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentation |
| title_full | A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentation |
| title_fullStr | A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentation |
| title_full_unstemmed | A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentation |
| title_short | A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentation |
| title_sort | multi stage model for bird s eye view prediction based on stereo matching model and rgb d semantic segmentation |
| topic | autonomous driving image recognition learning (artificial intelligence) |
| url | https://doi.org/10.1049/itr2.12367 |
| work_keys_str_mv | AT zhongyurao amultistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation AT yingfengcai amultistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation AT haiwang amultistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation AT longchen amultistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation AT yichengli amultistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation AT zhongyurao multistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation AT yingfengcai multistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation AT haiwang multistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation AT longchen multistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation AT yichengli multistagemodelforbirdseyeviewpredictionbasedonstereomatchingmodelandrgbdsemanticsegmentation |