ScaleFormer architecture for scale invariant human pose estimation with enhanced mixed features
Abstract Human pose estimation is a fundamental task in computer vision. However, existing methods face performance fluctuation challenges when processing human targets at different scales, especially in outdoor scenes where target distances and viewing angles frequently change. This paper proposes...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Online Access: | https://doi.org/10.1038/s41598-025-12620-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Human pose estimation is a fundamental task in computer vision. However, existing methods face performance fluctuation challenges when processing human targets at different scales, especially in outdoor scenes where target distances and viewing angles frequently change. This paper proposes ScaleFormer, a novel scale-invariant pose estimation framework that effectively addresses multi-scale pose estimation problems by innovatively combining the hierarchical feature extraction capabilities of Swin Transformer with the fine-grained feature enhancement mechanisms of ConvNeXt. We design an adaptive feature representation mechanism that enables the model to maintain consistent performance across different scales. Extensive experiments on the MPII human pose dataset demonstrate that ScaleFormer significantly outperforms existing methods on multiple metrics including PCKh, scale consistency score, and keypoint mean average precision. Notably, under extreme scaling conditions (scaling factor 2.0), ScaleFormer’s scale consistency score exceeds the baseline model by 48.8 percentage points. Under 30% random occlusion conditions, keypoint detection accuracy improves by 20.5 percentage points. Experiments further verify the complementary contributions of the two core components. These results indicate that ScaleFormer has significant advantages in practical application scenarios and provides new research directions for the field of pose estimation. |
|---|---|
| ISSN: | 2045-2322 |