PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.
Human pose estimation (HPE) has made significant progress with deep learning; however, it still faces challenges in handling occlusions, complex poses, and complex multi-person scenarios. To address these issues, we propose PoseNet++, a novel approach based on a 3-stacked hourglass architecture, inc...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0326232 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849424661529493504 |
|---|---|
| author | Chao Lv Geyao Ma |
| author_facet | Chao Lv Geyao Ma |
| author_sort | Chao Lv |
| collection | DOAJ |
| description | Human pose estimation (HPE) has made significant progress with deep learning; however, it still faces challenges in handling occlusions, complex poses, and complex multi-person scenarios. To address these issues, we propose PoseNet++, a novel approach based on a 3-stacked hourglass architecture, incorporating three key innovations: the multi-scale spatial pyramid attention hourglass module (MSPAHM), coordinate-channel prior convolutional attention (C-CPCA), and the PinSK Bottleneck Residual Module (PBRM). MSPAHM enhances long-range channel dependencies, enabling the model to better capture structural relationships between limb joints, particularly under occlusion. C-CPCA combines coordinate attention (CA) and channel prior convolutional attention (CPCA) to prioritize keypoints' regions and reduce the confusion in complex multi-person scenarios. The PBRM improves pose estimation accuracy by optimizing the receptive field and convolutional kernel selection, thus enhancing the network's feature extraction capabilities in multi-scale and complex poses. On the MPII validation set, PoseNet++ improves the PCKh score by 3.3% relative to the baseline 3-stacked hourglass network, while reducing the number of model parameters and the number of floating-point operations by 60.3% and 53.1%, respectively. Compared with other mainstream human pose estimation models in recent years, PoseNet++ achieves the state-of-the-art performance on the MPII, LSP, COCO and CrowdPose datasets. At the same time, the model complexity of PoseNet++ is much lower than that of methods with similar accuracy. |
| format | Article |
| id | doaj-art-29115e0e1bb942b38efade113d885b32 |
| institution | Kabale University |
| issn | 1932-6203 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Public Library of Science (PLoS) |
| record_format | Article |
| series | PLoS ONE |
| spelling | doaj-art-29115e0e1bb942b38efade113d885b322025-08-20T03:30:04ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01206e032623210.1371/journal.pone.0326232PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.Chao LvGeyao MaHuman pose estimation (HPE) has made significant progress with deep learning; however, it still faces challenges in handling occlusions, complex poses, and complex multi-person scenarios. To address these issues, we propose PoseNet++, a novel approach based on a 3-stacked hourglass architecture, incorporating three key innovations: the multi-scale spatial pyramid attention hourglass module (MSPAHM), coordinate-channel prior convolutional attention (C-CPCA), and the PinSK Bottleneck Residual Module (PBRM). MSPAHM enhances long-range channel dependencies, enabling the model to better capture structural relationships between limb joints, particularly under occlusion. C-CPCA combines coordinate attention (CA) and channel prior convolutional attention (CPCA) to prioritize keypoints' regions and reduce the confusion in complex multi-person scenarios. The PBRM improves pose estimation accuracy by optimizing the receptive field and convolutional kernel selection, thus enhancing the network's feature extraction capabilities in multi-scale and complex poses. On the MPII validation set, PoseNet++ improves the PCKh score by 3.3% relative to the baseline 3-stacked hourglass network, while reducing the number of model parameters and the number of floating-point operations by 60.3% and 53.1%, respectively. Compared with other mainstream human pose estimation models in recent years, PoseNet++ achieves the state-of-the-art performance on the MPII, LSP, COCO and CrowdPose datasets. At the same time, the model complexity of PoseNet++ is much lower than that of methods with similar accuracy.https://doi.org/10.1371/journal.pone.0326232 |
| spellingShingle | Chao Lv Geyao Ma PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation. PLoS ONE |
| title | PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation. |
| title_full | PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation. |
| title_fullStr | PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation. |
| title_full_unstemmed | PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation. |
| title_short | PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation. |
| title_sort | posenet a multi scale and optimized feature extraction network for high precision human pose estimation |
| url | https://doi.org/10.1371/journal.pone.0326232 |
| work_keys_str_mv | AT chaolv posenetamultiscaleandoptimizedfeatureextractionnetworkforhighprecisionhumanposeestimation AT geyaoma posenetamultiscaleandoptimizedfeatureextractionnetworkforhighprecisionhumanposeestimation |