DBF‐Net: A Deep Bidirectional Fusion Network for 6D Object Pose Estimation with Sparse Linear Transformer

6D object pose estimation, a critical component in computer vision and robotics domains, involves determining the 3D location and orientation of an object relative to a canonical reference frame. Recently, the widespread proliferation of RGB‐D sensors has precipitated a marked increase in interest t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xuan Fan, Tao An, Hongbo Gao, Tao Xie, Lijun Zhao, Ruifeng Li
Format:	Article
Language:	English
Published:	Wiley 2025-08-01
Series:	Advanced Intelligent Systems
Subjects:	6D object pose estimations deep learning feature representations RGB‐D transformers
Online Access:	https://doi.org/10.1002/aisy.202401001
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849230110385766400
author	Xuan Fan Tao An Hongbo Gao Tao Xie Lijun Zhao Ruifeng Li
author_facet	Xuan Fan Tao An Hongbo Gao Tao Xie Lijun Zhao Ruifeng Li
author_sort	Xuan Fan
collection	DOAJ
description	6D object pose estimation, a critical component in computer vision and robotics domains, involves determining the 3D location and orientation of an object relative to a canonical reference frame. Recently, the widespread proliferation of RGB‐D sensors has precipitated a marked increase in interest towards 6D pose estimation leveraging RGB‐D data. A deep bidirectional fusion network is developed, DBF‐Net, achieving efficient yet accurate 6D object pose estimation. Specifically, a sparse linear Transformer (SLT) with linear computation complexity is introduced to effectively leverage cross‐modal semantic resemblance during the feature extraction stage, such that it fully models semantic associations between various modalities and efficiently aggregates the globally enhanced features of each modality. Once acquiring two feature representations from two modalities, a feature balancer (FB) based on SLT is proposed to adaptively reconcile the importance of these feature representations. Leveraging the global receptive field of SLT, FB effectively eliminates the ambiguity induced by visual similarity in appearance representation or depth missing of reflective surfaces in geometry representations, thereby enhancing the generalization ability and robustness of the network. Experimental results demonstrate that DBF‐Net surpasses current state‐of‐the‐art works by nontrivial margins across multiple benchmarks. The code is available at https://github.com/Mrfanxuan/dbf_net.
format	Article
id	doaj-art-b6e14efd5326488aa1e03d3ca164d187
institution	Kabale University
issn	2640-4567
language	English
publishDate	2025-08-01
publisher	Wiley
record_format	Article
series	Advanced Intelligent Systems
spelling	doaj-art-b6e14efd5326488aa1e03d3ca164d1872025-08-21T11:05:47ZengWileyAdvanced Intelligent Systems2640-45672025-08-0178n/an/a10.1002/aisy.202401001DBF‐Net: A Deep Bidirectional Fusion Network for 6D Object Pose Estimation with Sparse Linear TransformerXuan Fan0Tao An1Hongbo Gao2Tao Xie3Lijun Zhao4Ruifeng Li5State Key Laboratory of Robotics and Systems Harbin Institute of Technology Harbin 150001 ChinaState Key Laboratory of Robotics and Systems Harbin Institute of Technology Harbin 150001 ChinaState Key Laboratory of Robotics and Systems Harbin Institute of Technology Harbin 150001 ChinaState Key Laboratory of Robotics and Systems Harbin Institute of Technology Harbin 150001 ChinaState Key Laboratory of Robotics and Systems Harbin Institute of Technology Harbin 150001 ChinaState Key Laboratory of Robotics and Systems Harbin Institute of Technology Harbin 150001 China6D object pose estimation, a critical component in computer vision and robotics domains, involves determining the 3D location and orientation of an object relative to a canonical reference frame. Recently, the widespread proliferation of RGB‐D sensors has precipitated a marked increase in interest towards 6D pose estimation leveraging RGB‐D data. A deep bidirectional fusion network is developed, DBF‐Net, achieving efficient yet accurate 6D object pose estimation. Specifically, a sparse linear Transformer (SLT) with linear computation complexity is introduced to effectively leverage cross‐modal semantic resemblance during the feature extraction stage, such that it fully models semantic associations between various modalities and efficiently aggregates the globally enhanced features of each modality. Once acquiring two feature representations from two modalities, a feature balancer (FB) based on SLT is proposed to adaptively reconcile the importance of these feature representations. Leveraging the global receptive field of SLT, FB effectively eliminates the ambiguity induced by visual similarity in appearance representation or depth missing of reflective surfaces in geometry representations, thereby enhancing the generalization ability and robustness of the network. Experimental results demonstrate that DBF‐Net surpasses current state‐of‐the‐art works by nontrivial margins across multiple benchmarks. The code is available at https://github.com/Mrfanxuan/dbf_net.https://doi.org/10.1002/aisy.2024010016D object pose estimationsdeep learningfeature representationsRGB‐Dtransformers
spellingShingle	Xuan Fan Tao An Hongbo Gao Tao Xie Lijun Zhao Ruifeng Li DBF‐Net: A Deep Bidirectional Fusion Network for 6D Object Pose Estimation with Sparse Linear Transformer Advanced Intelligent Systems 6D object pose estimations deep learning feature representations RGB‐D transformers
title	DBF‐Net: A Deep Bidirectional Fusion Network for 6D Object Pose Estimation with Sparse Linear Transformer
title_full	DBF‐Net: A Deep Bidirectional Fusion Network for 6D Object Pose Estimation with Sparse Linear Transformer
title_fullStr	DBF‐Net: A Deep Bidirectional Fusion Network for 6D Object Pose Estimation with Sparse Linear Transformer
title_full_unstemmed	DBF‐Net: A Deep Bidirectional Fusion Network for 6D Object Pose Estimation with Sparse Linear Transformer
title_short	DBF‐Net: A Deep Bidirectional Fusion Network for 6D Object Pose Estimation with Sparse Linear Transformer
title_sort	dbf net a deep bidirectional fusion network for 6d object pose estimation with sparse linear transformer
topic	6D object pose estimations deep learning feature representations RGB‐D transformers
url	https://doi.org/10.1002/aisy.202401001
work_keys_str_mv	AT xuanfan dbfnetadeepbidirectionalfusionnetworkfor6dobjectposeestimationwithsparselineartransformer AT taoan dbfnetadeepbidirectionalfusionnetworkfor6dobjectposeestimationwithsparselineartransformer AT hongbogao dbfnetadeepbidirectionalfusionnetworkfor6dobjectposeestimationwithsparselineartransformer AT taoxie dbfnetadeepbidirectionalfusionnetworkfor6dobjectposeestimationwithsparselineartransformer AT lijunzhao dbfnetadeepbidirectionalfusionnetworkfor6dobjectposeestimationwithsparselineartransformer AT ruifengli dbfnetadeepbidirectionalfusionnetworkfor6dobjectposeestimationwithsparselineartransformer

DBF‐Net: A Deep Bidirectional Fusion Network for 6D Object Pose Estimation with Sparse Linear Transformer

Similar Items