MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
Human pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crow...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-12-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/24/24/8158 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850238920800862208 |
|---|---|
| author | Jianqiang Zhang Jing Hou Qiusheng He Zhengwei Yuan Hao Xue |
| author_facet | Jianqiang Zhang Jing Hou Qiusheng He Zhengwei Yuan Hao Xue |
| author_sort | Jianqiang Zhang |
| collection | DOAJ |
| description | Human pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crowds, and it is still difficult to detect small targets. In this paper, we propose a Mamba-based human pose estimation. First, we design a GMamba structure to be used as a backbone network to extract human keypoints. A gating mechanism is introduced into the linear layer of Mamba, which allows the model to dynamically adjust the weights according to the different input images to locate the human keypoints more precisely. Secondly, GMamba as the backbone network can effectively solve the long-sequence problem. The direct use of convolutional downsampling reduces selectivity for different stages of information flow. We used slice downsampling (SD) to reduce the resolution of the feature map to half the original size, and then fused local features from four different locations. The fusion of multi-channel information helped the model obtain rich pose information. Finally, we introduced an adaptive threshold focus loss (ATFL) to dynamically adjust the weights of different keypoints. We assigned higher weights to error-prone keypoints to strengthen the model’s attention to these points. Thus, we effectively improved the accuracy of keypoint identification in cases of occlusion, complex background, etc., and significantly improved the overall performance of attitude estimation and anti-interference ability. Experimental results showed that the AP and AP50 of the proposed algorithm on the COCO 2017 validation set were 72.2 and 92.6. Compared with the typical algorithm, it was improved by 1.1% on AP50. The proposed method can effectively detect the keypoints of the human body, and provides stronger robustness and accuracy for the estimation of human posture in complex scenes. |
| format | Article |
| id | doaj-art-5e4221fa0c204dde92eebd459f2b8826 |
| institution | OA Journals |
| issn | 1424-8220 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-5e4221fa0c204dde92eebd459f2b88262025-08-20T02:01:19ZengMDPI AGSensors1424-82202024-12-012424815810.3390/s24248158MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and MambaJianqiang Zhang0Jing Hou1Qiusheng He2Zhengwei Yuan3Hao Xue4School of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaSchool of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaSchool of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaCollege of Modern Urban Construction Industry, Tianjin Chengjian University, Tianjin 300384, ChinaSchool of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaHuman pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crowds, and it is still difficult to detect small targets. In this paper, we propose a Mamba-based human pose estimation. First, we design a GMamba structure to be used as a backbone network to extract human keypoints. A gating mechanism is introduced into the linear layer of Mamba, which allows the model to dynamically adjust the weights according to the different input images to locate the human keypoints more precisely. Secondly, GMamba as the backbone network can effectively solve the long-sequence problem. The direct use of convolutional downsampling reduces selectivity for different stages of information flow. We used slice downsampling (SD) to reduce the resolution of the feature map to half the original size, and then fused local features from four different locations. The fusion of multi-channel information helped the model obtain rich pose information. Finally, we introduced an adaptive threshold focus loss (ATFL) to dynamically adjust the weights of different keypoints. We assigned higher weights to error-prone keypoints to strengthen the model’s attention to these points. Thus, we effectively improved the accuracy of keypoint identification in cases of occlusion, complex background, etc., and significantly improved the overall performance of attitude estimation and anti-interference ability. Experimental results showed that the AP and AP50 of the proposed algorithm on the COCO 2017 validation set were 72.2 and 92.6. Compared with the typical algorithm, it was improved by 1.1% on AP50. The proposed method can effectively detect the keypoints of the human body, and provides stronger robustness and accuracy for the estimation of human posture in complex scenes.https://www.mdpi.com/1424-8220/24/24/8158pose estimationMambadownsamplingfeature fusionloss function |
| spellingShingle | Jianqiang Zhang Jing Hou Qiusheng He Zhengwei Yuan Hao Xue MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba Sensors pose estimation Mamba downsampling feature fusion loss function |
| title | MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba |
| title_full | MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba |
| title_fullStr | MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba |
| title_full_unstemmed | MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba |
| title_short | MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba |
| title_sort | mambapose a human pose estimation based on gated feedforward network and mamba |
| topic | pose estimation Mamba downsampling feature fusion loss function |
| url | https://www.mdpi.com/1424-8220/24/24/8158 |
| work_keys_str_mv | AT jianqiangzhang mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba AT jinghou mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba AT qiushenghe mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba AT zhengweiyuan mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba AT haoxue mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba |