PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.

Human pose estimation (HPE) has made significant progress with deep learning; however, it still faces challenges in handling occlusions, complex poses, and complex multi-person scenarios. To address these issues, we propose PoseNet++, a novel approach based on a 3-stacked hourglass architecture, inc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chao Lv, Geyao Ma
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0326232
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849424661529493504
author	Chao Lv Geyao Ma
author_facet	Chao Lv Geyao Ma
author_sort	Chao Lv
collection	DOAJ
description	Human pose estimation (HPE) has made significant progress with deep learning; however, it still faces challenges in handling occlusions, complex poses, and complex multi-person scenarios. To address these issues, we propose PoseNet++, a novel approach based on a 3-stacked hourglass architecture, incorporating three key innovations: the multi-scale spatial pyramid attention hourglass module (MSPAHM), coordinate-channel prior convolutional attention (C-CPCA), and the PinSK Bottleneck Residual Module (PBRM). MSPAHM enhances long-range channel dependencies, enabling the model to better capture structural relationships between limb joints, particularly under occlusion. C-CPCA combines coordinate attention (CA) and channel prior convolutional attention (CPCA) to prioritize keypoints' regions and reduce the confusion in complex multi-person scenarios. The PBRM improves pose estimation accuracy by optimizing the receptive field and convolutional kernel selection, thus enhancing the network's feature extraction capabilities in multi-scale and complex poses. On the MPII validation set, PoseNet++ improves the PCKh score by 3.3% relative to the baseline 3-stacked hourglass network, while reducing the number of model parameters and the number of floating-point operations by 60.3% and 53.1%, respectively. Compared with other mainstream human pose estimation models in recent years, PoseNet++ achieves the state-of-the-art performance on the MPII, LSP, COCO and CrowdPose datasets. At the same time, the model complexity of PoseNet++ is much lower than that of methods with similar accuracy.
format	Article
id	doaj-art-29115e0e1bb942b38efade113d885b32
institution	Kabale University
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-29115e0e1bb942b38efade113d885b322025-08-20T03:30:04ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01206e032623210.1371/journal.pone.0326232PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.Chao LvGeyao MaHuman pose estimation (HPE) has made significant progress with deep learning; however, it still faces challenges in handling occlusions, complex poses, and complex multi-person scenarios. To address these issues, we propose PoseNet++, a novel approach based on a 3-stacked hourglass architecture, incorporating three key innovations: the multi-scale spatial pyramid attention hourglass module (MSPAHM), coordinate-channel prior convolutional attention (C-CPCA), and the PinSK Bottleneck Residual Module (PBRM). MSPAHM enhances long-range channel dependencies, enabling the model to better capture structural relationships between limb joints, particularly under occlusion. C-CPCA combines coordinate attention (CA) and channel prior convolutional attention (CPCA) to prioritize keypoints' regions and reduce the confusion in complex multi-person scenarios. The PBRM improves pose estimation accuracy by optimizing the receptive field and convolutional kernel selection, thus enhancing the network's feature extraction capabilities in multi-scale and complex poses. On the MPII validation set, PoseNet++ improves the PCKh score by 3.3% relative to the baseline 3-stacked hourglass network, while reducing the number of model parameters and the number of floating-point operations by 60.3% and 53.1%, respectively. Compared with other mainstream human pose estimation models in recent years, PoseNet++ achieves the state-of-the-art performance on the MPII, LSP, COCO and CrowdPose datasets. At the same time, the model complexity of PoseNet++ is much lower than that of methods with similar accuracy.https://doi.org/10.1371/journal.pone.0326232
spellingShingle	Chao Lv Geyao Ma PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation. PLoS ONE
title	PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.
title_full	PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.
title_fullStr	PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.
title_full_unstemmed	PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.
title_short	PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.
title_sort	posenet a multi scale and optimized feature extraction network for high precision human pose estimation
url	https://doi.org/10.1371/journal.pone.0326232
work_keys_str_mv	AT chaolv posenetamultiscaleandoptimizedfeatureextractionnetworkforhighprecisionhumanposeestimation AT geyaoma posenetamultiscaleandoptimizedfeatureextractionnetworkforhighprecisionhumanposeestimation

PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.

Similar Items