MSFA-BEVNet: Optimization of BEV Scene Recognition Driven by Multiscale Feature Fusion and Alignment

Scene understanding and multisource data fusion are critical challenges in autonomous self-driving systems.In particular, optimizing information fusion strategies for three-dimensional Bird’s Eye View (BEV) scene recognition tasks is crucial for accurate perception and decision-making in...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiubin Cao, Yifan Li, Hongwei Li
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Autonomous driving bird’s eye view multiscale feature fusion LiDAR feature alignment
Online Access:	https://ieeexplore.ieee.org/document/10979852/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850191888494100480
author	Xiubin Cao Yifan Li Hongwei Li
author_facet	Xiubin Cao Yifan Li Hongwei Li
author_sort	Xiubin Cao
collection	DOAJ
description	Scene understanding and multisource data fusion are critical challenges in autonomous self-driving systems.In particular, optimizing information fusion strategies for three-dimensional Bird’s Eye View (BEV) scene recognition tasks is crucial for accurate perception and decision-making in dynamic environments. This study proposes a novel architecture that integrates multiscale feature extraction and crossmodal structural alignment to enhance the representation and detection capabilities of BEV features. Specifically, we employ a DCN-based block for visual feature extraction, comprising layer normalization (LN), feedforward networks (FFNs), and the Gaussian Error Linear Unit (GELU) activation function, aligned with the Vision Transformer (ViT) paradigm to improve feature modeling. To fully utilize multiscale information, a dedicated multiscale feature fusion block is introduced to extract expressive scene features within the feature space. Furthermore, we leverage LiDAR to generate LIDAR BEV features and propose a feature alignment block to enhance the complementarity between camera and LiDAR BEV features. The proposed architecture effectively supports precise scene recognition and adaptive decision-making in multi-sensor fusion environments, providing robust perception capabilities for autonomous driving in complex scenarios.
format	Article
id	doaj-art-a933693874734175985eebb154d6dfda
institution	OA Journals
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-a933693874734175985eebb154d6dfda2025-08-20T02:14:45ZengIEEEIEEE Access2169-35362025-01-0113757077571710.1109/ACCESS.2025.356532810979852MSFA-BEVNet: Optimization of BEV Scene Recognition Driven by Multiscale Feature Fusion and AlignmentXiubin Cao0https://orcid.org/0009-0002-6438-3058Yifan Li1Hongwei Li2https://orcid.org/0000-0001-6231-4126School of Geo-Science and Technology, Zhengzhou University, Zhengzhou, ChinaInstitute for Geophysics and Meteorology, University of Cologne, Cologne, GermanySchool of Geo-Science and Technology, Zhengzhou University, Zhengzhou, ChinaScene understanding and multisource data fusion are critical challenges in autonomous self-driving systems.In particular, optimizing information fusion strategies for three-dimensional Bird’s Eye View (BEV) scene recognition tasks is crucial for accurate perception and decision-making in dynamic environments. This study proposes a novel architecture that integrates multiscale feature extraction and crossmodal structural alignment to enhance the representation and detection capabilities of BEV features. Specifically, we employ a DCN-based block for visual feature extraction, comprising layer normalization (LN), feedforward networks (FFNs), and the Gaussian Error Linear Unit (GELU) activation function, aligned with the Vision Transformer (ViT) paradigm to improve feature modeling. To fully utilize multiscale information, a dedicated multiscale feature fusion block is introduced to extract expressive scene features within the feature space. Furthermore, we leverage LiDAR to generate LIDAR BEV features and propose a feature alignment block to enhance the complementarity between camera and LiDAR BEV features. The proposed architecture effectively supports precise scene recognition and adaptive decision-making in multi-sensor fusion environments, providing robust perception capabilities for autonomous driving in complex scenarios.https://ieeexplore.ieee.org/document/10979852/Autonomous drivingbird’s eye viewmultiscale feature fusionLiDARfeature alignment
spellingShingle	Xiubin Cao Yifan Li Hongwei Li MSFA-BEVNet: Optimization of BEV Scene Recognition Driven by Multiscale Feature Fusion and Alignment IEEE Access Autonomous driving bird’s eye view multiscale feature fusion LiDAR feature alignment
title	MSFA-BEVNet: Optimization of BEV Scene Recognition Driven by Multiscale Feature Fusion and Alignment
title_full	MSFA-BEVNet: Optimization of BEV Scene Recognition Driven by Multiscale Feature Fusion and Alignment
title_fullStr	MSFA-BEVNet: Optimization of BEV Scene Recognition Driven by Multiscale Feature Fusion and Alignment
title_full_unstemmed	MSFA-BEVNet: Optimization of BEV Scene Recognition Driven by Multiscale Feature Fusion and Alignment
title_short	MSFA-BEVNet: Optimization of BEV Scene Recognition Driven by Multiscale Feature Fusion and Alignment
title_sort	msfa bevnet optimization of bev scene recognition driven by multiscale feature fusion and alignment
topic	Autonomous driving bird’s eye view multiscale feature fusion LiDAR feature alignment
url	https://ieeexplore.ieee.org/document/10979852/
work_keys_str_mv	AT xiubincao msfabevnetoptimizationofbevscenerecognitiondrivenbymultiscalefeaturefusionandalignment AT yifanli msfabevnetoptimizationofbevscenerecognitiondrivenbymultiscalefeaturefusionandalignment AT hongweili msfabevnetoptimizationofbevscenerecognitiondrivenbymultiscalefeaturefusionandalignment

MSFA-BEVNet: Optimization of BEV Scene Recognition Driven by Multiscale Feature Fusion and Alignment

Similar Items