MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network
Accurate and efficient 6D pose estimation is a fundamental technology in many industrial applications. While existing dense correspondence methods have shown progress, they face challenges in multimodal feature fusion under complex scenarios involving occlusions, illumination variations, and sensor...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11021472/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850130477540704256 |
|---|---|
| author | Jiaqi Zhu Bin Li Xinhua Zhao |
| author_facet | Jiaqi Zhu Bin Li Xinhua Zhao |
| author_sort | Jiaqi Zhu |
| collection | DOAJ |
| description | Accurate and efficient 6D pose estimation is a fundamental technology in many industrial applications. While existing dense correspondence methods have shown progress, they face challenges in multimodal feature fusion under complex scenarios involving occlusions, illumination variations, and sensor noise. This paper proposes a novel 6D pose estimation framework that addresses these limitations through a hybrid Mamba-Graph architecture. The algorithm first introduces a panoramic attention fusion Mamba module, leveraging state-space modeling to capture long-range dependencies in multi-modal data while establishing cross-dimensional interactions between channel and spatial features to emphasize critical information. A dynamic graph convolutional adaptive fusion module is then designed to enable cross-modal geometric consistency modeling via multi-modal feature integration. Finally, a texture-geometry co-driven keypoint selection mechanism is proposed to ensure keypoint distributions satisfy both spatial uniformity and discriminability requirements. Experimental results on three common datasets demonstrate that the proposed algorithm achieves ADD(-S) metrics of 99.82%, 80.26%, and 97.2%, respectively. Notably, it exhibits significant advantages in pose estimation for objects with repetitive textures and high symmetry. |
| format | Article |
| id | doaj-art-bdec6f87e2784d32a9bedcaee4aeb743 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-bdec6f87e2784d32a9bedcaee4aeb7432025-08-20T02:32:41ZengIEEEIEEE Access2169-35362025-01-011310043310044510.1109/ACCESS.2025.357577811021472MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution NetworkJiaqi Zhu0https://orcid.org/0009-0000-8116-3984Bin Li1Xinhua Zhao2School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, ChinaTianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, ChinaSchool of Computer Science and Engineering, Tianjin University of Technology, Tianjin, ChinaAccurate and efficient 6D pose estimation is a fundamental technology in many industrial applications. While existing dense correspondence methods have shown progress, they face challenges in multimodal feature fusion under complex scenarios involving occlusions, illumination variations, and sensor noise. This paper proposes a novel 6D pose estimation framework that addresses these limitations through a hybrid Mamba-Graph architecture. The algorithm first introduces a panoramic attention fusion Mamba module, leveraging state-space modeling to capture long-range dependencies in multi-modal data while establishing cross-dimensional interactions between channel and spatial features to emphasize critical information. A dynamic graph convolutional adaptive fusion module is then designed to enable cross-modal geometric consistency modeling via multi-modal feature integration. Finally, a texture-geometry co-driven keypoint selection mechanism is proposed to ensure keypoint distributions satisfy both spatial uniformity and discriminability requirements. Experimental results on three common datasets demonstrate that the proposed algorithm achieves ADD(-S) metrics of 99.82%, 80.26%, and 97.2%, respectively. Notably, it exhibits significant advantages in pose estimation for objects with repetitive textures and high symmetry.https://ieeexplore.ieee.org/document/11021472/6D pose estimationpanoramic attentionfusion mambagraph feature fusion |
| spellingShingle | Jiaqi Zhu Bin Li Xinhua Zhao MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network IEEE Access 6D pose estimation panoramic attention fusion mamba graph feature fusion |
| title | MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network |
| title_full | MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network |
| title_fullStr | MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network |
| title_full_unstemmed | MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network |
| title_short | MG6D: A Deep Fusion Approach for 6D Pose Estimation With Mamba and Graph Convolution Network |
| title_sort | mg6d a deep fusion approach for 6d pose estimation with mamba and graph convolution network |
| topic | 6D pose estimation panoramic attention fusion mamba graph feature fusion |
| url | https://ieeexplore.ieee.org/document/11021472/ |
| work_keys_str_mv | AT jiaqizhu mg6dadeepfusionapproachfor6dposeestimationwithmambaandgraphconvolutionnetwork AT binli mg6dadeepfusionapproachfor6dposeestimationwithmambaandgraphconvolutionnetwork AT xinhuazhao mg6dadeepfusionapproachfor6dposeestimationwithmambaandgraphconvolutionnetwork |