Leveraging modality‐specific and shared features for RGB‐T salient object detection

Abstract Most of the existing RGB‐T salient object detection methods are usually based on dual‐stream encoding single‐stream decoding network architecture. These models always rely on the quality of fusion features, which often focus on modality‐shared features and overlook modality‐specific feature...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shuo Wang, Gang Yang, Qiqi Xu, Xun Dai
Format:	Article
Language:	English
Published:	Wiley 2024-12-01
Series:	IET Computer Vision
Subjects:	computer vision learning (artificial intelligence)
Online Access:	https://doi.org/10.1049/cvi2.12307
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850059651952934912
author	Shuo Wang Gang Yang Qiqi Xu Xun Dai
author_facet	Shuo Wang Gang Yang Qiqi Xu Xun Dai
author_sort	Shuo Wang
collection	DOAJ
description	Abstract Most of the existing RGB‐T salient object detection methods are usually based on dual‐stream encoding single‐stream decoding network architecture. These models always rely on the quality of fusion features, which often focus on modality‐shared features and overlook modality‐specific features, thus failing to fully utilise the rich information contained in multi‐modality data. To this end, a modality separate tri‐stream net (MSTNet), which consists of a tri‐stream encoding (TSE) structure and a tri‐stream decoding (TSD) structure is proposed. The TSE explicitly separates and extracts the modality‐shared and modality‐specific features to improve the utilisation of multi‐modality data. In addition, based on the hybrid‐attention and cross‐attention mechanism, we design an enhanced complementary fusion module (ECF), which fully considers the complementarity between the features to be fused and realises high‐quality feature fusion. Furthermore, in TSD, the quality of uni‐modality features is ensured under the constraint of supervision. Finally, to make full use of the rich multi‐level and multi‐scale decoding features contained in TSD, the authors design the adaptive multi‐scale decoding module and the multi‐stream feature aggregation module to improve the decoding capability. Extensive experiments on three public datasets show that the MSTNet outperforms 14 state‐of‐the‐art methods, demonstrating that this method can extract and utilise the multi‐modality information more adequately and extract more complete and rich features, thus improving the model's performance. The code will be released at https://github.com/JOOOOKII/MSTNet.
format	Article
id	doaj-art-7ef0a104dfe24ae29944b911265a682c
institution	DOAJ
issn	1751-9632 1751-9640
language	English
publishDate	2024-12-01
publisher	Wiley
record_format	Article
series	IET Computer Vision
spelling	doaj-art-7ef0a104dfe24ae29944b911265a682c2025-08-20T02:50:49ZengWileyIET Computer Vision1751-96321751-96402024-12-011881285129910.1049/cvi2.12307Leveraging modality‐specific and shared features for RGB‐T salient object detectionShuo Wang0Gang Yang1Qiqi Xu2Xun Dai3Northeastern University Shenyang ChinaNortheastern University Shenyang ChinaNortheastern University Shenyang ChinaNortheastern University Shenyang ChinaAbstract Most of the existing RGB‐T salient object detection methods are usually based on dual‐stream encoding single‐stream decoding network architecture. These models always rely on the quality of fusion features, which often focus on modality‐shared features and overlook modality‐specific features, thus failing to fully utilise the rich information contained in multi‐modality data. To this end, a modality separate tri‐stream net (MSTNet), which consists of a tri‐stream encoding (TSE) structure and a tri‐stream decoding (TSD) structure is proposed. The TSE explicitly separates and extracts the modality‐shared and modality‐specific features to improve the utilisation of multi‐modality data. In addition, based on the hybrid‐attention and cross‐attention mechanism, we design an enhanced complementary fusion module (ECF), which fully considers the complementarity between the features to be fused and realises high‐quality feature fusion. Furthermore, in TSD, the quality of uni‐modality features is ensured under the constraint of supervision. Finally, to make full use of the rich multi‐level and multi‐scale decoding features contained in TSD, the authors design the adaptive multi‐scale decoding module and the multi‐stream feature aggregation module to improve the decoding capability. Extensive experiments on three public datasets show that the MSTNet outperforms 14 state‐of‐the‐art methods, demonstrating that this method can extract and utilise the multi‐modality information more adequately and extract more complete and rich features, thus improving the model's performance. The code will be released at https://github.com/JOOOOKII/MSTNet.https://doi.org/10.1049/cvi2.12307computer visionlearning (artificial intelligence)
spellingShingle	Shuo Wang Gang Yang Qiqi Xu Xun Dai Leveraging modality‐specific and shared features for RGB‐T salient object detection IET Computer Vision computer vision learning (artificial intelligence)
title	Leveraging modality‐specific and shared features for RGB‐T salient object detection
title_full	Leveraging modality‐specific and shared features for RGB‐T salient object detection
title_fullStr	Leveraging modality‐specific and shared features for RGB‐T salient object detection
title_full_unstemmed	Leveraging modality‐specific and shared features for RGB‐T salient object detection
title_short	Leveraging modality‐specific and shared features for RGB‐T salient object detection
title_sort	leveraging modality specific and shared features for rgb t salient object detection
topic	computer vision learning (artificial intelligence)
url	https://doi.org/10.1049/cvi2.12307
work_keys_str_mv	AT shuowang leveragingmodalityspecificandsharedfeaturesforrgbtsalientobjectdetection AT gangyang leveragingmodalityspecificandsharedfeaturesforrgbtsalientobjectdetection AT qiqixu leveragingmodalityspecificandsharedfeaturesforrgbtsalientobjectdetection AT xundai leveragingmodalityspecificandsharedfeaturesforrgbtsalientobjectdetection

Leveraging modality‐specific and shared features for RGB‐T salient object detection

Similar Items