Origin centric and part based pose decomposition for 3D human pose estimation

Abstract Transformer-based approaches have recently made significant advancements in 3D human pose estimation from 2D inputs. Existing methods typically either consider the entire 2D skeleton for global features extraction or break it into independent parts for local features learning. However, capt...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhijie Lin, Jinxin Yao, Juan Huang, Jingjing Chen, Yingying Xu, Lu Yang, Lei Zhao, Wei Xing
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-08-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-025-16381-y
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Transformer-based approaches have recently made significant advancements in 3D human pose estimation from 2D inputs. Existing methods typically either consider the entire 2D skeleton for global features extraction or break it into independent parts for local features learning. However, capturing the spatial dependencies of the entire 2D skeleton does not effectively facilitate learning local spatial features, while partitioning the skeleton into independent segments disrupts the relevance of individual joints to the whole. In this paper, we propose a novel Origin-centric Part Transformer (OPFormer) block to address this issue through two steps: Skeleton Separation and Skeleton Recombination. Skeleton Separation separates the 2D skeleton into several distinct parts, enabling the extraction of fine-grained local spatial features that accurately reflect the geometric structure of the human body. Secondly, we introduce the concept of a human skeleton Origin, which serves as a central hub to reconnect different parts through Skeleton Recombination. The resulting local features, when fused with global features from the Spatial Transformer Encoder, yield more accurate 3D results. Comprehensive experiments conducted on the Human3.6M and MPI-INF-3DHP benchmark datasets verify that our approach attains state-of-the-art performance. It should be emphasized that OPFormer achieves a Mean Per Joint Position Error (MPJPE) of 37.6mm on the Human3.6M dataset without any additional training data.
ISSN:	2045-2322

Origin centric and part based pose decomposition for 3D human pose estimation

Similar Items