LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection
This paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/13/4040 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849319792366845952 |
|---|---|
| author | Qijun Feng Chunyang Zhao Pengfei Liu Zhichao Zhang Yue Jin Wanglin Tian |
| author_facet | Qijun Feng Chunyang Zhao Pengfei Liu Zhichao Zhang Yue Jin Wanglin Tian |
| author_sort | Qijun Feng |
| collection | DOAJ |
| description | This paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a more cost-effective solution. Existing methods struggle with capturing long-range dependencies and cross-task information due to limitations in attention mechanisms. To address this, we propose a Long-Range Cross-Task Detection Head (LRCH) to capture these dependencies and integrate cross-task information for accurate predictions. Additionally, we introduce the Long-Term Temporal Perception Module (LTPM), which efficiently extracts temporal features by combining Mamba and linear attention, overcoming challenges in temporal frame extraction. Experimental results in the nuScenes dataset demonstrate that our proposed LST-BEV outperforms its baseline (SA-BEVPool) by 2.1% mAP and 2.7% NDS, indicating a significant performance improvement. |
| format | Article |
| id | doaj-art-9e6a4be48d864c4dbf1688a7cb25868c |
| institution | Kabale University |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-9e6a4be48d864c4dbf1688a7cb25868c2025-08-20T03:50:20ZengMDPI AGSensors1424-82202025-06-012513404010.3390/s25134040LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object DetectionQijun Feng0Chunyang Zhao1Pengfei Liu2Zhichao Zhang3Yue Jin4Wanglin Tian5School of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, ChinaSchool of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, ChinaShenyang Institute of Automation Chinese Academy of Sciences, Shenyang 110169, ChinaSchool of International Studies, Northeast Normal University, Changchun 130024, ChinaSchool of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, ChinaSchool of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, ChinaThis paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a more cost-effective solution. Existing methods struggle with capturing long-range dependencies and cross-task information due to limitations in attention mechanisms. To address this, we propose a Long-Range Cross-Task Detection Head (LRCH) to capture these dependencies and integrate cross-task information for accurate predictions. Additionally, we introduce the Long-Term Temporal Perception Module (LTPM), which efficiently extracts temporal features by combining Mamba and linear attention, overcoming challenges in temporal frame extraction. Experimental results in the nuScenes dataset demonstrate that our proposed LST-BEV outperforms its baseline (SA-BEVPool) by 2.1% mAP and 2.7% NDS, indicating a significant performance improvement.https://www.mdpi.com/1424-8220/25/13/4040autonomous drivingbird’s-eye view (BEV)3D object detectionlarge kernel convolutionlong-term temporal features |
| spellingShingle | Qijun Feng Chunyang Zhao Pengfei Liu Zhichao Zhang Yue Jin Wanglin Tian LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection Sensors autonomous driving bird’s-eye view (BEV) 3D object detection large kernel convolution long-term temporal features |
| title | LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection |
| title_full | LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection |
| title_fullStr | LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection |
| title_full_unstemmed | LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection |
| title_short | LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection |
| title_sort | lst bev generating a long term spatial temporal bird s eye view feature for multi view 3d object detection |
| topic | autonomous driving bird’s-eye view (BEV) 3D object detection large kernel convolution long-term temporal features |
| url | https://www.mdpi.com/1424-8220/25/13/4040 |
| work_keys_str_mv | AT qijunfeng lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection AT chunyangzhao lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection AT pengfeiliu lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection AT zhichaozhang lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection AT yuejin lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection AT wanglintian lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection |