LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection

This paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a...

Full description

Saved in:
Bibliographic Details
Main Authors: Qijun Feng, Chunyang Zhao, Pengfei Liu, Zhichao Zhang, Yue Jin, Wanglin Tian
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/13/4040
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319792366845952
author Qijun Feng
Chunyang Zhao
Pengfei Liu
Zhichao Zhang
Yue Jin
Wanglin Tian
author_facet Qijun Feng
Chunyang Zhao
Pengfei Liu
Zhichao Zhang
Yue Jin
Wanglin Tian
author_sort Qijun Feng
collection DOAJ
description This paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a more cost-effective solution. Existing methods struggle with capturing long-range dependencies and cross-task information due to limitations in attention mechanisms. To address this, we propose a Long-Range Cross-Task Detection Head (LRCH) to capture these dependencies and integrate cross-task information for accurate predictions. Additionally, we introduce the Long-Term Temporal Perception Module (LTPM), which efficiently extracts temporal features by combining Mamba and linear attention, overcoming challenges in temporal frame extraction. Experimental results in the nuScenes dataset demonstrate that our proposed LST-BEV outperforms its baseline (SA-BEVPool) by 2.1% mAP and 2.7% NDS, indicating a significant performance improvement.
format Article
id doaj-art-9e6a4be48d864c4dbf1688a7cb25868c
institution Kabale University
issn 1424-8220
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-9e6a4be48d864c4dbf1688a7cb25868c2025-08-20T03:50:20ZengMDPI AGSensors1424-82202025-06-012513404010.3390/s25134040LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object DetectionQijun Feng0Chunyang Zhao1Pengfei Liu2Zhichao Zhang3Yue Jin4Wanglin Tian5School of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, ChinaSchool of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, ChinaShenyang Institute of Automation Chinese Academy of Sciences, Shenyang 110169, ChinaSchool of International Studies, Northeast Normal University, Changchun 130024, ChinaSchool of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, ChinaSchool of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, ChinaThis paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a more cost-effective solution. Existing methods struggle with capturing long-range dependencies and cross-task information due to limitations in attention mechanisms. To address this, we propose a Long-Range Cross-Task Detection Head (LRCH) to capture these dependencies and integrate cross-task information for accurate predictions. Additionally, we introduce the Long-Term Temporal Perception Module (LTPM), which efficiently extracts temporal features by combining Mamba and linear attention, overcoming challenges in temporal frame extraction. Experimental results in the nuScenes dataset demonstrate that our proposed LST-BEV outperforms its baseline (SA-BEVPool) by 2.1% mAP and 2.7% NDS, indicating a significant performance improvement.https://www.mdpi.com/1424-8220/25/13/4040autonomous drivingbird’s-eye view (BEV)3D object detectionlarge kernel convolutionlong-term temporal features
spellingShingle Qijun Feng
Chunyang Zhao
Pengfei Liu
Zhichao Zhang
Yue Jin
Wanglin Tian
LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection
Sensors
autonomous driving
bird’s-eye view (BEV)
3D object detection
large kernel convolution
long-term temporal features
title LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection
title_full LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection
title_fullStr LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection
title_full_unstemmed LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection
title_short LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection
title_sort lst bev generating a long term spatial temporal bird s eye view feature for multi view 3d object detection
topic autonomous driving
bird’s-eye view (BEV)
3D object detection
large kernel convolution
long-term temporal features
url https://www.mdpi.com/1424-8220/25/13/4040
work_keys_str_mv AT qijunfeng lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection
AT chunyangzhao lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection
AT pengfeiliu lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection
AT zhichaozhang lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection
AT yuejin lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection
AT wanglintian lstbevgeneratingalongtermspatialtemporalbirdseyeviewfeatureformultiview3dobjectdetection