DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation

3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Co...

Full description

Saved in:
Bibliographic Details
Main Authors: Linzhan Zhong, Fangni Chen, Bolun Zheng, Rui Feng, Lei Zhang, Jian Wan
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10975753/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850281700150476800
author Linzhan Zhong
Fangni Chen
Bolun Zheng
Rui Feng
Lei Zhang
Jian Wan
author_facet Linzhan Zhong
Fangni Chen
Bolun Zheng
Rui Feng
Lei Zhang
Jian Wan
author_sort Linzhan Zhong
collection DOAJ
description 3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Consistency-based Multi-Hypothesis Aggregation (CCMA). Moreover, it incorporate the Discrete Cosine Transform (DCT) for frequency-domain feature extraction. Specifically, the diffusion model generates diverse and plausible hypotheses, and CCMA aggregates them based on confidence and consistency, effectively addressing depth ambiguity. Additionally, we incorporate DCT into the self-attention mechanism to transform input data into the frequency domain, thereby enhancing feature extraction while significantly reducing computational complexity. To validate DCT-DiffPose, we conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Our method achieves a 19% lower Mean Per Joint Position Error (MPJPE) and a 55% reduction in FLOPs compared to D3DP. The results demonstrate its excellent trade-off between accuracy and complexity.
format Article
id doaj-art-81eccde04d4548ccb9616e6b869fe680
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-81eccde04d4548ccb9616e6b869fe6802025-08-20T01:48:12ZengIEEEIEEE Access2169-35362025-01-0113733197333110.1109/ACCESS.2025.356389810975753DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose EstimationLinzhan Zhong0https://orcid.org/0009-0003-7533-9455Fangni Chen1https://orcid.org/0000-0002-4518-585XBolun Zheng2https://orcid.org/0000-0001-8788-1725Rui Feng3Lei Zhang4Jian Wan5https://orcid.org/0000-0001-9882-3029College of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaSchool of Automation, Hangzhou Dianzi University, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Consistency-based Multi-Hypothesis Aggregation (CCMA). Moreover, it incorporate the Discrete Cosine Transform (DCT) for frequency-domain feature extraction. Specifically, the diffusion model generates diverse and plausible hypotheses, and CCMA aggregates them based on confidence and consistency, effectively addressing depth ambiguity. Additionally, we incorporate DCT into the self-attention mechanism to transform input data into the frequency domain, thereby enhancing feature extraction while significantly reducing computational complexity. To validate DCT-DiffPose, we conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Our method achieves a 19% lower Mean Per Joint Position Error (MPJPE) and a 55% reduction in FLOPs compared to D3DP. The results demonstrate its excellent trade-off between accuracy and complexity.https://ieeexplore.ieee.org/document/10975753/3D human pose estimationdiscrete cosine transformdiffusion modelmulti-hypothesis
spellingShingle Linzhan Zhong
Fangni Chen
Bolun Zheng
Rui Feng
Lei Zhang
Jian Wan
DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
IEEE Access
3D human pose estimation
discrete cosine transform
diffusion model
multi-hypothesis
title DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_full DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_fullStr DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_full_unstemmed DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_short DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_sort dct diffpose a lightweight diffusion model with multi hypothesis for 3d human pose estimation
topic 3D human pose estimation
discrete cosine transform
diffusion model
multi-hypothesis
url https://ieeexplore.ieee.org/document/10975753/
work_keys_str_mv AT linzhanzhong dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation
AT fangnichen dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation
AT bolunzheng dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation
AT ruifeng dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation
AT leizhang dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation
AT jianwan dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation