Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation

Monocular depth estimation is a central problem in computer vision and robot vision, aiming at obtaining the depth information of a scene from a single image. In some extreme environments such as dynamics or drastic lighting changes, monocular depth estimation methods based on conventional cameras o...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huimei Duan, Chenggang Guo, Yuan Ou
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Sensors
Subjects:	monocular depth estimation event cameras coordinate attention gate recurrent units
Online Access:	https://www.mdpi.com/1424-8220/24/23/7752
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850106524886630400
author	Huimei Duan Chenggang Guo Yuan Ou
author_facet	Huimei Duan Chenggang Guo Yuan Ou
author_sort	Huimei Duan
collection	DOAJ
description	Monocular depth estimation is a central problem in computer vision and robot vision, aiming at obtaining the depth information of a scene from a single image. In some extreme environments such as dynamics or drastic lighting changes, monocular depth estimation methods based on conventional cameras often perform poorly. Event cameras are able to capture brightness changes asynchronously but are not able to acquire color and absolute brightness information. Thus, it is an ideal choice to make full use of the complementary advantages of event cameras and conventional cameras. However, how to effectively fuse event data and frames to improve the accuracy and robustness of monocular depth estimation remains an urgent problem. To overcome these challenges, a novel Coordinate Attention Gated Recurrent Unit (CAGRU) is proposed in this paper. Unlike the conventional ConvGRUs, our CAGRU abandons the conventional practice of using convolutional layers for all the gates and innovatively designs the coordinate attention as an attention gate and combines it with the convolutional gate. Coordinate attention explicitly models inter-channel dependencies and coordinate information in space. The coordinate attention gate in conjunction with the convolutional gate enable the network to model feature information spatially, temporally, and internally across channels. Based on this, the CAGRU can enhance the information density of the sparse events in the spatial domain in the recursive process of temporal information, thereby achieving more effective feature screening and fusion. It can effectively integrate feature information from event cameras and standard cameras, further improving the accuracy and robustness of monocular depth estimation. The experimental results show that the method proposed in this paper achieves significant performance improvements on different public datasets.
format	Article
id	doaj-art-d8f876e162a642708745f637bcd84ffc
institution	OA Journals
issn	1424-8220
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-d8f876e162a642708745f637bcd84ffc2025-08-20T02:38:48ZengMDPI AGSensors1424-82202024-12-012423775210.3390/s24237752Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth EstimationHuimei Duan0Chenggang Guo1Yuan Ou2School of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu 610039, ChinaMonocular depth estimation is a central problem in computer vision and robot vision, aiming at obtaining the depth information of a scene from a single image. In some extreme environments such as dynamics or drastic lighting changes, monocular depth estimation methods based on conventional cameras often perform poorly. Event cameras are able to capture brightness changes asynchronously but are not able to acquire color and absolute brightness information. Thus, it is an ideal choice to make full use of the complementary advantages of event cameras and conventional cameras. However, how to effectively fuse event data and frames to improve the accuracy and robustness of monocular depth estimation remains an urgent problem. To overcome these challenges, a novel Coordinate Attention Gated Recurrent Unit (CAGRU) is proposed in this paper. Unlike the conventional ConvGRUs, our CAGRU abandons the conventional practice of using convolutional layers for all the gates and innovatively designs the coordinate attention as an attention gate and combines it with the convolutional gate. Coordinate attention explicitly models inter-channel dependencies and coordinate information in space. The coordinate attention gate in conjunction with the convolutional gate enable the network to model feature information spatially, temporally, and internally across channels. Based on this, the CAGRU can enhance the information density of the sparse events in the spatial domain in the recursive process of temporal information, thereby achieving more effective feature screening and fusion. It can effectively integrate feature information from event cameras and standard cameras, further improving the accuracy and robustness of monocular depth estimation. The experimental results show that the method proposed in this paper achieves significant performance improvements on different public datasets.https://www.mdpi.com/1424-8220/24/23/7752monocular depth estimationevent camerascoordinate attentiongate recurrent units
spellingShingle	Huimei Duan Chenggang Guo Yuan Ou Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation Sensors monocular depth estimation event cameras coordinate attention gate recurrent units
title	Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation
title_full	Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation
title_fullStr	Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation
title_full_unstemmed	Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation
title_short	Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation
title_sort	fusing events and frames with coordinate attention gated recurrent unit for monocular depth estimation
topic	monocular depth estimation event cameras coordinate attention gate recurrent units
url	https://www.mdpi.com/1424-8220/24/23/7752
work_keys_str_mv	AT huimeiduan fusingeventsandframeswithcoordinateattentiongatedrecurrentunitformonoculardepthestimation AT chenggangguo fusingeventsandframeswithcoordinateattentiongatedrecurrentunitformonoculardepthestimation AT yuanou fusingeventsandframeswithcoordinateattentiongatedrecurrentunitformonoculardepthestimation

Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation

Similar Items