DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation

3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Co...

Full description

Saved in:
Bibliographic Details
Main Authors: Linzhan Zhong, Fangni Chen, Bolun Zheng, Rui Feng, Lei Zhang, Jian Wan
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10975753/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Consistency-based Multi-Hypothesis Aggregation (CCMA). Moreover, it incorporate the Discrete Cosine Transform (DCT) for frequency-domain feature extraction. Specifically, the diffusion model generates diverse and plausible hypotheses, and CCMA aggregates them based on confidence and consistency, effectively addressing depth ambiguity. Additionally, we incorporate DCT into the self-attention mechanism to transform input data into the frequency domain, thereby enhancing feature extraction while significantly reducing computational complexity. To validate DCT-DiffPose, we conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Our method achieves a 19% lower Mean Per Joint Position Error (MPJPE) and a 55% reduction in FLOPs compared to D3DP. The results demonstrate its excellent trade-off between accuracy and complexity.
ISSN:2169-3536