MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive Optimization

Monocular 3D object detection refers to detecting 3D objects using a single camera. This approach offers low sensor costs, high resolution, and rich texture information, making it widely adopted. However, monocular sensors face challenges from environmental factors like occlusion and truncation, lea...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuhan Gao, Peng Wang, Xiaoyan Li, Mengyu Sun, Ruohai Di, Liangliang Li, Wei Hong
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/3/760
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850199957430075392
author Yuhan Gao
Peng Wang
Xiaoyan Li
Mengyu Sun
Ruohai Di
Liangliang Li
Wei Hong
author_facet Yuhan Gao
Peng Wang
Xiaoyan Li
Mengyu Sun
Ruohai Di
Liangliang Li
Wei Hong
author_sort Yuhan Gao
collection DOAJ
description Monocular 3D object detection refers to detecting 3D objects using a single camera. This approach offers low sensor costs, high resolution, and rich texture information, making it widely adopted. However, monocular sensors face challenges from environmental factors like occlusion and truncation, leading to reduced detection accuracy. Additionally, the lack of depth information poses significant challenges for predicting 3D positions. To address these issues, this paper presents a monocular 3D object detection method based on improvements to MonoCD, designed to enhance detection accuracy and robustness in complex environments. In order to effectively obtain and integrate depth information, this paper designs a multi-branch depth prediction with weight sharing module. Furthermore, an adaptive focus mechanism is proposed to emphasize target regions while minimizing interference from irrelevant areas. The experimental results demonstrate that MonoDFNet achieves significant improvements over existing methods, with AP3D gains of +4.09% (Easy), +2.78% (Moderate), and +1.63% (Hard), confirming its effectiveness in 3D object detection.
format Article
id doaj-art-28c219e88f9449d28a6abefb6dd9ab96
institution OA Journals
issn 1424-8220
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-28c219e88f9449d28a6abefb6dd9ab962025-08-20T02:12:29ZengMDPI AGSensors1424-82202025-01-0125376010.3390/s25030760MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive OptimizationYuhan Gao0Peng Wang1Xiaoyan Li2Mengyu Sun3Ruohai Di4Liangliang Li5Wei Hong6School of Electronics Information Engineering, Xi’an Technological University, Xi’an 710021, ChinaSchool of Electronics Information Engineering, Xi’an Technological University, Xi’an 710021, ChinaSchool of Electronics Information Engineering, Xi’an Technological University, Xi’an 710021, ChinaSchool of Optoelectronic Engineering, Xi’an Technological University, Xi’an 710021, ChinaSchool of Electronics Information Engineering, Xi’an Technological University, Xi’an 710021, ChinaSchool of Mechanical and Electrical Engineering, Xi’an Technological University, Xi’an 710021, ChinaSchool of Mechanical and Electrical Engineering, Xi’an Technological University, Xi’an 710021, ChinaMonocular 3D object detection refers to detecting 3D objects using a single camera. This approach offers low sensor costs, high resolution, and rich texture information, making it widely adopted. However, monocular sensors face challenges from environmental factors like occlusion and truncation, leading to reduced detection accuracy. Additionally, the lack of depth information poses significant challenges for predicting 3D positions. To address these issues, this paper presents a monocular 3D object detection method based on improvements to MonoCD, designed to enhance detection accuracy and robustness in complex environments. In order to effectively obtain and integrate depth information, this paper designs a multi-branch depth prediction with weight sharing module. Furthermore, an adaptive focus mechanism is proposed to emphasize target regions while minimizing interference from irrelevant areas. The experimental results demonstrate that MonoDFNet achieves significant improvements over existing methods, with AP3D gains of +4.09% (Easy), +2.78% (Moderate), and +1.63% (Hard), confirming its effectiveness in 3D object detection.https://www.mdpi.com/1424-8220/25/3/760deep learningmonocular 3D detectiondepth estimation
spellingShingle Yuhan Gao
Peng Wang
Xiaoyan Li
Mengyu Sun
Ruohai Di
Liangliang Li
Wei Hong
MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive Optimization
Sensors
deep learning
monocular 3D detection
depth estimation
title MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive Optimization
title_full MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive Optimization
title_fullStr MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive Optimization
title_full_unstemmed MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive Optimization
title_short MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive Optimization
title_sort monodfnet monocular 3d object detection with depth fusion and adaptive optimization
topic deep learning
monocular 3D detection
depth estimation
url https://www.mdpi.com/1424-8220/25/3/760
work_keys_str_mv AT yuhangao monodfnetmonocular3dobjectdetectionwithdepthfusionandadaptiveoptimization
AT pengwang monodfnetmonocular3dobjectdetectionwithdepthfusionandadaptiveoptimization
AT xiaoyanli monodfnetmonocular3dobjectdetectionwithdepthfusionandadaptiveoptimization
AT mengyusun monodfnetmonocular3dobjectdetectionwithdepthfusionandadaptiveoptimization
AT ruohaidi monodfnetmonocular3dobjectdetectionwithdepthfusionandadaptiveoptimization
AT liangliangli monodfnetmonocular3dobjectdetectionwithdepthfusionandadaptiveoptimization
AT weihong monodfnetmonocular3dobjectdetectionwithdepthfusionandadaptiveoptimization