DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Co...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10975753/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | 3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Consistency-based Multi-Hypothesis Aggregation (CCMA). Moreover, it incorporate the Discrete Cosine Transform (DCT) for frequency-domain feature extraction. Specifically, the diffusion model generates diverse and plausible hypotheses, and CCMA aggregates them based on confidence and consistency, effectively addressing depth ambiguity. Additionally, we incorporate DCT into the self-attention mechanism to transform input data into the frequency domain, thereby enhancing feature extraction while significantly reducing computational complexity. To validate DCT-DiffPose, we conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Our method achieves a 19% lower Mean Per Joint Position Error (MPJPE) and a 55% reduction in FLOPs compared to D3DP. The results demonstrate its excellent trade-off between accuracy and complexity. |
|---|---|
| ISSN: | 2169-3536 |