DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation

3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Co...

Full description

Saved in:

Bibliographic Details
Main Authors:	Linzhan Zhong, Fangni Chen, Bolun Zheng, Rui Feng, Lei Zhang, Jian Wan
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	3D human pose estimation discrete cosine transform diffusion model multi-hypothesis
Online Access:	https://ieeexplore.ieee.org/document/10975753/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850281700150476800
author	Linzhan Zhong Fangni Chen Bolun Zheng Rui Feng Lei Zhang Jian Wan
author_facet	Linzhan Zhong Fangni Chen Bolun Zheng Rui Feng Lei Zhang Jian Wan
author_sort	Linzhan Zhong
collection	DOAJ
description	3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Consistency-based Multi-Hypothesis Aggregation (CCMA). Moreover, it incorporate the Discrete Cosine Transform (DCT) for frequency-domain feature extraction. Specifically, the diffusion model generates diverse and plausible hypotheses, and CCMA aggregates them based on confidence and consistency, effectively addressing depth ambiguity. Additionally, we incorporate DCT into the self-attention mechanism to transform input data into the frequency domain, thereby enhancing feature extraction while significantly reducing computational complexity. To validate DCT-DiffPose, we conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Our method achieves a 19% lower Mean Per Joint Position Error (MPJPE) and a 55% reduction in FLOPs compared to D3DP. The results demonstrate its excellent trade-off between accuracy and complexity.
format	Article
id	doaj-art-81eccde04d4548ccb9616e6b869fe680
institution	OA Journals
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-81eccde04d4548ccb9616e6b869fe6802025-08-20T01:48:12ZengIEEEIEEE Access2169-35362025-01-0113733197333110.1109/ACCESS.2025.356389810975753DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose EstimationLinzhan Zhong0https://orcid.org/0009-0003-7533-9455Fangni Chen1https://orcid.org/0000-0002-4518-585XBolun Zheng2https://orcid.org/0000-0001-8788-1725Rui Feng3Lei Zhang4Jian Wan5https://orcid.org/0000-0001-9882-3029College of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaSchool of Automation, Hangzhou Dianzi University, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Consistency-based Multi-Hypothesis Aggregation (CCMA). Moreover, it incorporate the Discrete Cosine Transform (DCT) for frequency-domain feature extraction. Specifically, the diffusion model generates diverse and plausible hypotheses, and CCMA aggregates them based on confidence and consistency, effectively addressing depth ambiguity. Additionally, we incorporate DCT into the self-attention mechanism to transform input data into the frequency domain, thereby enhancing feature extraction while significantly reducing computational complexity. To validate DCT-DiffPose, we conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Our method achieves a 19% lower Mean Per Joint Position Error (MPJPE) and a 55% reduction in FLOPs compared to D3DP. The results demonstrate its excellent trade-off between accuracy and complexity.https://ieeexplore.ieee.org/document/10975753/3D human pose estimationdiscrete cosine transformdiffusion modelmulti-hypothesis
spellingShingle	Linzhan Zhong Fangni Chen Bolun Zheng Rui Feng Lei Zhang Jian Wan DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation IEEE Access 3D human pose estimation discrete cosine transform diffusion model multi-hypothesis
title	DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_full	DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_fullStr	DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_full_unstemmed	DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_short	DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
title_sort	dct diffpose a lightweight diffusion model with multi hypothesis for 3d human pose estimation
topic	3D human pose estimation discrete cosine transform diffusion model multi-hypothesis
url	https://ieeexplore.ieee.org/document/10975753/
work_keys_str_mv	AT linzhanzhong dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT fangnichen dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT bolunzheng dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT ruifeng dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT leizhang dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT jianwan dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation

DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation

Similar Items