Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection

The depth map contains abundant spatial structure cues, which makes it extensively introduced into saliency detection tasks for improving the detection accuracy. Nevertheless, the acquired depth map is often with uneven quality, due to the interference of depth sensors and external environments, pos...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiajia Wu, Guangliang Han, Haining Wang, Hang Yang, Qingqing Li, Dongxu Liu, Fangjian Ye, Peixun Liu
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9606676/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849716303767535616
author Jiajia Wu
Guangliang Han
Haining Wang
Hang Yang
Qingqing Li
Dongxu Liu
Fangjian Ye
Peixun Liu
author_facet Jiajia Wu
Guangliang Han
Haining Wang
Hang Yang
Qingqing Li
Dongxu Liu
Fangjian Ye
Peixun Liu
author_sort Jiajia Wu
collection DOAJ
description The depth map contains abundant spatial structure cues, which makes it extensively introduced into saliency detection tasks for improving the detection accuracy. Nevertheless, the acquired depth map is often with uneven quality, due to the interference of depth sensors and external environments, posing a challenge when trying to minimize the disturbances from low-quality depth maps during the fusion process. In this article, to mitigate such issues and highlight the salient objects, we propose a progressive guided fusion network (PGFNet) with multi-modal and multi-scale attention for RGB-D salient object detection. Particularly, we first present a multi-modal and multi-scale attention fusion model (MMAFM) to fully mine and utilize the complementarity of features at different scales and modalities for achieving optimal fusion. Then, to strengthen the semantic expressiveness of the shallow-layer features, we design a multi-modal feature refinement mechanism (MFRM), which exploits the high-level fusion feature to guide the enhancement of the shallow-layer original RGB and depth features before they are fused. Moreover, a residual prediction module (RPM) is applied to further suppress background elements. Our entire network adopts a top-down strategy to progressively excavate and integrate valuable information. Compared with the state-of-the-art methods, experimental results demonstrate the effectiveness of our proposed method both qualitatively and quantitatively on eight challenging benchmark datasets.
format Article
id doaj-art-a36210973e84426688a7859caa3fc9f1
institution DOAJ
issn 2169-3536
language English
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-a36210973e84426688a7859caa3fc9f12025-08-20T03:13:03ZengIEEEIEEE Access2169-35362021-01-01915060815062210.1109/ACCESS.2021.31263389606676Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object DetectionJiajia Wu0https://orcid.org/0000-0001-7667-4878Guangliang Han1Haining Wang2Hang Yang3https://orcid.org/0000-0001-6027-1337Qingqing Li4https://orcid.org/0000-0002-2339-2399Dongxu Liu5Fangjian Ye6Peixun Liu7Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaSchool of Police Administration, People’s Public Security University of China, Beijing, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaInstitute of Forensic Science, Ministry of Public Security, Beijing, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, ChinaThe depth map contains abundant spatial structure cues, which makes it extensively introduced into saliency detection tasks for improving the detection accuracy. Nevertheless, the acquired depth map is often with uneven quality, due to the interference of depth sensors and external environments, posing a challenge when trying to minimize the disturbances from low-quality depth maps during the fusion process. In this article, to mitigate such issues and highlight the salient objects, we propose a progressive guided fusion network (PGFNet) with multi-modal and multi-scale attention for RGB-D salient object detection. Particularly, we first present a multi-modal and multi-scale attention fusion model (MMAFM) to fully mine and utilize the complementarity of features at different scales and modalities for achieving optimal fusion. Then, to strengthen the semantic expressiveness of the shallow-layer features, we design a multi-modal feature refinement mechanism (MFRM), which exploits the high-level fusion feature to guide the enhancement of the shallow-layer original RGB and depth features before they are fused. Moreover, a residual prediction module (RPM) is applied to further suppress background elements. Our entire network adopts a top-down strategy to progressively excavate and integrate valuable information. Compared with the state-of-the-art methods, experimental results demonstrate the effectiveness of our proposed method both qualitatively and quantitatively on eight challenging benchmark datasets.https://ieeexplore.ieee.org/document/9606676/RGB-Dsalient object detectionmulti-modal and multi-scale attentionprogressive guided fusion
spellingShingle Jiajia Wu
Guangliang Han
Haining Wang
Hang Yang
Qingqing Li
Dongxu Liu
Fangjian Ye
Peixun Liu
Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
IEEE Access
RGB-D
salient object detection
multi-modal and multi-scale attention
progressive guided fusion
title Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
title_full Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
title_fullStr Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
title_full_unstemmed Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
title_short Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
title_sort progressive guided fusion network with multi modal and multi scale attention for rgb d salient object detection
topic RGB-D
salient object detection
multi-modal and multi-scale attention
progressive guided fusion
url https://ieeexplore.ieee.org/document/9606676/
work_keys_str_mv AT jiajiawu progressiveguidedfusionnetworkwithmultimodalandmultiscaleattentionforrgbdsalientobjectdetection
AT guanglianghan progressiveguidedfusionnetworkwithmultimodalandmultiscaleattentionforrgbdsalientobjectdetection
AT hainingwang progressiveguidedfusionnetworkwithmultimodalandmultiscaleattentionforrgbdsalientobjectdetection
AT hangyang progressiveguidedfusionnetworkwithmultimodalandmultiscaleattentionforrgbdsalientobjectdetection
AT qingqingli progressiveguidedfusionnetworkwithmultimodalandmultiscaleattentionforrgbdsalientobjectdetection
AT dongxuliu progressiveguidedfusionnetworkwithmultimodalandmultiscaleattentionforrgbdsalientobjectdetection
AT fangjianye progressiveguidedfusionnetworkwithmultimodalandmultiscaleattentionforrgbdsalientobjectdetection
AT peixunliu progressiveguidedfusionnetworkwithmultimodalandmultiscaleattentionforrgbdsalientobjectdetection