MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba

Human pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crow...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jianqiang Zhang, Jing Hou, Qiusheng He, Zhengwei Yuan, Hao Xue
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Sensors
Subjects:	pose estimation Mamba downsampling feature fusion loss function
Online Access:	https://www.mdpi.com/1424-8220/24/24/8158
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850238920800862208
author	Jianqiang Zhang Jing Hou Qiusheng He Zhengwei Yuan Hao Xue
author_facet	Jianqiang Zhang Jing Hou Qiusheng He Zhengwei Yuan Hao Xue
author_sort	Jianqiang Zhang
collection	DOAJ
description	Human pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crowds, and it is still difficult to detect small targets. In this paper, we propose a Mamba-based human pose estimation. First, we design a GMamba structure to be used as a backbone network to extract human keypoints. A gating mechanism is introduced into the linear layer of Mamba, which allows the model to dynamically adjust the weights according to the different input images to locate the human keypoints more precisely. Secondly, GMamba as the backbone network can effectively solve the long-sequence problem. The direct use of convolutional downsampling reduces selectivity for different stages of information flow. We used slice downsampling (SD) to reduce the resolution of the feature map to half the original size, and then fused local features from four different locations. The fusion of multi-channel information helped the model obtain rich pose information. Finally, we introduced an adaptive threshold focus loss (ATFL) to dynamically adjust the weights of different keypoints. We assigned higher weights to error-prone keypoints to strengthen the model’s attention to these points. Thus, we effectively improved the accuracy of keypoint identification in cases of occlusion, complex background, etc., and significantly improved the overall performance of attitude estimation and anti-interference ability. Experimental results showed that the AP and AP50 of the proposed algorithm on the COCO 2017 validation set were 72.2 and 92.6. Compared with the typical algorithm, it was improved by 1.1% on AP50. The proposed method can effectively detect the keypoints of the human body, and provides stronger robustness and accuracy for the estimation of human posture in complex scenes.
format	Article
id	doaj-art-5e4221fa0c204dde92eebd459f2b8826
institution	OA Journals
issn	1424-8220
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-5e4221fa0c204dde92eebd459f2b88262025-08-20T02:01:19ZengMDPI AGSensors1424-82202024-12-012424815810.3390/s24248158MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and MambaJianqiang Zhang0Jing Hou1Qiusheng He2Zhengwei Yuan3Hao Xue4School of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaSchool of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaSchool of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaCollege of Modern Urban Construction Industry, Tianjin Chengjian University, Tianjin 300384, ChinaSchool of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, ChinaHuman pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crowds, and it is still difficult to detect small targets. In this paper, we propose a Mamba-based human pose estimation. First, we design a GMamba structure to be used as a backbone network to extract human keypoints. A gating mechanism is introduced into the linear layer of Mamba, which allows the model to dynamically adjust the weights according to the different input images to locate the human keypoints more precisely. Secondly, GMamba as the backbone network can effectively solve the long-sequence problem. The direct use of convolutional downsampling reduces selectivity for different stages of information flow. We used slice downsampling (SD) to reduce the resolution of the feature map to half the original size, and then fused local features from four different locations. The fusion of multi-channel information helped the model obtain rich pose information. Finally, we introduced an adaptive threshold focus loss (ATFL) to dynamically adjust the weights of different keypoints. We assigned higher weights to error-prone keypoints to strengthen the model’s attention to these points. Thus, we effectively improved the accuracy of keypoint identification in cases of occlusion, complex background, etc., and significantly improved the overall performance of attitude estimation and anti-interference ability. Experimental results showed that the AP and AP50 of the proposed algorithm on the COCO 2017 validation set were 72.2 and 92.6. Compared with the typical algorithm, it was improved by 1.1% on AP50. The proposed method can effectively detect the keypoints of the human body, and provides stronger robustness and accuracy for the estimation of human posture in complex scenes.https://www.mdpi.com/1424-8220/24/24/8158pose estimationMambadownsamplingfeature fusionloss function
spellingShingle	Jianqiang Zhang Jing Hou Qiusheng He Zhengwei Yuan Hao Xue MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba Sensors pose estimation Mamba downsampling feature fusion loss function
title	MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_full	MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_fullStr	MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_full_unstemmed	MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_short	MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba
title_sort	mambapose a human pose estimation based on gated feedforward network and mamba
topic	pose estimation Mamba downsampling feature fusion loss function
url	https://www.mdpi.com/1424-8220/24/24/8158
work_keys_str_mv	AT jianqiangzhang mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba AT jinghou mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba AT qiushenghe mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba AT zhengweiyuan mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba AT haoxue mambaposeahumanposeestimationbasedongatedfeedforwardnetworkandmamba

MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba

Similar Items