MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba

Human pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crow...

Full description

Saved in:
Bibliographic Details
Main Authors: Jianqiang Zhang, Jing Hou, Qiusheng He, Zhengwei Yuan, Hao Xue
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/24/8158
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850238920800862208
author Jianqiang Zhang
Jing Hou
Qiusheng He
Zhengwei Yuan
Hao Xue
author_facet Jianqiang Zhang
Jing Hou
Qiusheng He
Zhengwei Yuan
Hao Xue
author_sort Jianqiang Zhang
collection DOAJ
description Human pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crowds, and it is still difficult to detect small targets. In this paper, we propose a Mamba-based human pose estimation. First, we design a GMamba structure to be used as a backbone network to extract human keypoints. A gating mechanism is introduced into the linear layer of Mamba, which allows the model to dynamically adjust the weights according to the different input images to locate the human keypoints more precisely. Secondly, GMamba as the backbone network can effectively solve the long-sequence problem. The direct use of convolutional downsampling reduces selectivity for different stages of information flow. We used slice downsampling (SD) to reduce the resolution of the feature map to half the original size, and then fused local features from four different locations. The fusion of multi-channel information helped the model obtain rich pose information. Finally, we introduced an adaptive threshold focus loss (ATFL) to dynamically adjust the weights of different keypoints. We assigned higher weights to error-prone keypoints to strengthen the model’s attention to these points. Thus, we effectively improved the accuracy of keypoint identification in cases of occlusion, complex background, etc., and significantly improved the overall performance of attitude estimation and anti-interference ability. Experimental results showed that the AP and AP50 of the proposed algorithm on the COCO 2017 validation set were 72.2 and 92.6. Compared with the typical algorithm, it was improved by 1.1% on AP50. The proposed method can effectively detect the keypoints of the human body, and provides stronger robustness and accuracy for the estimation of human posture in complex scenes.
format Article
id doaj-art-5e4221fa0c204dde92eebd459f2b8826
institution OA Journals
issn 1424-8220
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-5e4221fa0c204dde92eebd459f2b88262025-08-20T02:01:19ZengMDPI AGSensors1424-82202024-12-012424815810.3390/s24248158MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and MambaJianqiang Zhang0Jing Hou1Qiusheng He2Zhengwei Yuan3Hao Xue4School of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaSchool of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaSchool of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaCollege of Modern Urban Construction Industry, Tianjin Chengjian University, Tianjin 300384, ChinaSchool of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaHuman pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crowds, and it is still difficult to detect small targets. In this paper, we propose a Mamba-based human pose estimation. First, we design a GMamba structure to be used as a backbone network to extract human keypoints. A gating mechanism is introduced into the linear layer of Mamba, which allows the model to dynamically adjust the weights according to the different input images to locate the human keypoints more precisely. Secondly, GMamba as the backbone network can effectively solve the long-sequence problem. The direct use of convolutional downsampling reduces selectivity for different stages of information flow. We used slice downsampling (SD) to reduce the resolution of the feature map to half the original size, and then fused local features from four different locations. The fusion of multi-channel information helped the model obtain rich pose information. Finally, we introduced an adaptive threshold focus loss (ATFL) to dynamically adjust the weights of different keypoints. We assigned higher weights to error-prone keypoints to strengthen the model’s attention to these points. Thus, we effectively improved the accuracy of keypoint identification in cases of occlusion, complex background, etc., and significantly improved the overall performance of attitude estimation and anti-interference ability. Experimental results showed that the AP and AP50 of the proposed algorithm on the COCO 2017 validation set were 72.2 and 92.6. Compared with the typical algorithm, it was improved by 1.1% on AP50. The proposed method can effectively detect the keypoints of the human body, and provides stronger robustness and accuracy for the estimation of human posture in complex scenes.https://www.mdpi.com/1424-8220/24/24/8158pose estimationMambadownsamplingfeature fusionloss function
spellingShingle Jianqiang Zhang
Jing Hou
Qiusheng He
Zhengwei Yuan
Hao Xue
MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
Sensors
pose estimation
Mamba
downsampling
feature fusion
loss function
title MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_full MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_fullStr MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_full_unstemmed MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_short MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_sort mambapose a human pose estimation based on gated feedforward network and mamba
topic pose estimation
Mamba
downsampling
feature fusion
loss function
url https://www.mdpi.com/1424-8220/24/24/8158
work_keys_str_mv AT jianqiangzhang mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba
AT jinghou mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba
AT qiushenghe mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba
AT zhengweiyuan mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba
AT haoxue mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba