MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network

Accurate and efficient 6D pose estimation is a fundamental technology in many industrial applications. While existing dense correspondence methods have shown progress, they face challenges in multimodal feature fusion under complex scenarios involving occlusions, illumination variations, and sensor...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiaqi Zhu, Bin Li, Xinhua Zhao
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11021472/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850130477540704256
author Jiaqi Zhu
Bin Li
Xinhua Zhao
author_facet Jiaqi Zhu
Bin Li
Xinhua Zhao
author_sort Jiaqi Zhu
collection DOAJ
description Accurate and efficient 6D pose estimation is a fundamental technology in many industrial applications. While existing dense correspondence methods have shown progress, they face challenges in multimodal feature fusion under complex scenarios involving occlusions, illumination variations, and sensor noise. This paper proposes a novel 6D pose estimation framework that addresses these limitations through a hybrid Mamba-Graph architecture. The algorithm first introduces a panoramic attention fusion Mamba module, leveraging state-space modeling to capture long-range dependencies in multi-modal data while establishing cross-dimensional interactions between channel and spatial features to emphasize critical information. A dynamic graph convolutional adaptive fusion module is then designed to enable cross-modal geometric consistency modeling via multi-modal feature integration. Finally, a texture-geometry co-driven keypoint selection mechanism is proposed to ensure keypoint distributions satisfy both spatial uniformity and discriminability requirements. Experimental results on three common datasets demonstrate that the proposed algorithm achieves ADD(-S) metrics of 99.82%, 80.26%, and 97.2%, respectively. Notably, it exhibits significant advantages in pose estimation for objects with repetitive textures and high symmetry.
format Article
id doaj-art-bdec6f87e2784d32a9bedcaee4aeb743
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-bdec6f87e2784d32a9bedcaee4aeb7432025-08-20T02:32:41ZengIEEEIEEE Access2169-35362025-01-011310043310044510.1109/ACCESS.2025.357577811021472MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution NetworkJiaqi Zhu0https://orcid.org/0009-0000-8116-3984Bin Li1Xinhua Zhao2School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, ChinaTianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, ChinaSchool of Computer Science and Engineering, Tianjin University of Technology, Tianjin, ChinaAccurate and efficient 6D pose estimation is a fundamental technology in many industrial applications. While existing dense correspondence methods have shown progress, they face challenges in multimodal feature fusion under complex scenarios involving occlusions, illumination variations, and sensor noise. This paper proposes a novel 6D pose estimation framework that addresses these limitations through a hybrid Mamba-Graph architecture. The algorithm first introduces a panoramic attention fusion Mamba module, leveraging state-space modeling to capture long-range dependencies in multi-modal data while establishing cross-dimensional interactions between channel and spatial features to emphasize critical information. A dynamic graph convolutional adaptive fusion module is then designed to enable cross-modal geometric consistency modeling via multi-modal feature integration. Finally, a texture-geometry co-driven keypoint selection mechanism is proposed to ensure keypoint distributions satisfy both spatial uniformity and discriminability requirements. Experimental results on three common datasets demonstrate that the proposed algorithm achieves ADD(-S) metrics of 99.82%, 80.26%, and 97.2%, respectively. Notably, it exhibits significant advantages in pose estimation for objects with repetitive textures and high symmetry.https://ieeexplore.ieee.org/document/11021472/6D pose estimationpanoramic attentionfusion mambagraph feature fusion
spellingShingle Jiaqi Zhu
Bin Li
Xinhua Zhao
MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network
IEEE Access
6D pose estimation
panoramic attention
fusion mamba
graph feature fusion
title MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network
title_full MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network
title_fullStr MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network
title_full_unstemmed MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network
title_short MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network
title_sort mg6d a deep fusion approach for 6d pose estimation with mamba and graph convolution network
topic 6D pose estimation
panoramic attention
fusion mamba
graph feature fusion
url https://ieeexplore.ieee.org/document/11021472/
work_keys_str_mv AT jiaqizhu mg6dadeepfusionapproachfor6dposeestimationwithmambaandgraphconvolutionnetwork
AT binli mg6dadeepfusionapproachfor6dposeestimationwithmambaandgraphconvolutionnetwork
AT xinhuazhao mg6dadeepfusionapproachfor6dposeestimationwithmambaandgraphconvolutionnetwork