DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation
3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Co...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10975753/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850281700150476800 |
|---|---|
| author | Linzhan Zhong Fangni Chen Bolun Zheng Rui Feng Lei Zhang Jian Wan |
| author_facet | Linzhan Zhong Fangni Chen Bolun Zheng Rui Feng Lei Zhang Jian Wan |
| author_sort | Linzhan Zhong |
| collection | DOAJ |
| description | 3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Consistency-based Multi-Hypothesis Aggregation (CCMA). Moreover, it incorporate the Discrete Cosine Transform (DCT) for frequency-domain feature extraction. Specifically, the diffusion model generates diverse and plausible hypotheses, and CCMA aggregates them based on confidence and consistency, effectively addressing depth ambiguity. Additionally, we incorporate DCT into the self-attention mechanism to transform input data into the frequency domain, thereby enhancing feature extraction while significantly reducing computational complexity. To validate DCT-DiffPose, we conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Our method achieves a 19% lower Mean Per Joint Position Error (MPJPE) and a 55% reduction in FLOPs compared to D3DP. The results demonstrate its excellent trade-off between accuracy and complexity. |
| format | Article |
| id | doaj-art-81eccde04d4548ccb9616e6b869fe680 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-81eccde04d4548ccb9616e6b869fe6802025-08-20T01:48:12ZengIEEEIEEE Access2169-35362025-01-0113733197333110.1109/ACCESS.2025.356389810975753DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose EstimationLinzhan Zhong0https://orcid.org/0009-0003-7533-9455Fangni Chen1https://orcid.org/0000-0002-4518-585XBolun Zheng2https://orcid.org/0000-0001-8788-1725Rui Feng3Lei Zhang4Jian Wan5https://orcid.org/0000-0001-9882-3029College of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaSchool of Automation, Hangzhou Dianzi University, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, ChinaCollege of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China3D human pose estimation is a crucial task in computer vision with extensive applications, yet it remains challenging due to depth ambiguity and constraints on computational efficiency. In this paper, we propose DCT-DiffPose, a novel framework that integrates a diffusion model with Confidence and Consistency-based Multi-Hypothesis Aggregation (CCMA). Moreover, it incorporate the Discrete Cosine Transform (DCT) for frequency-domain feature extraction. Specifically, the diffusion model generates diverse and plausible hypotheses, and CCMA aggregates them based on confidence and consistency, effectively addressing depth ambiguity. Additionally, we incorporate DCT into the self-attention mechanism to transform input data into the frequency domain, thereby enhancing feature extraction while significantly reducing computational complexity. To validate DCT-DiffPose, we conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Our method achieves a 19% lower Mean Per Joint Position Error (MPJPE) and a 55% reduction in FLOPs compared to D3DP. The results demonstrate its excellent trade-off between accuracy and complexity.https://ieeexplore.ieee.org/document/10975753/3D human pose estimationdiscrete cosine transformdiffusion modelmulti-hypothesis |
| spellingShingle | Linzhan Zhong Fangni Chen Bolun Zheng Rui Feng Lei Zhang Jian Wan DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation IEEE Access 3D human pose estimation discrete cosine transform diffusion model multi-hypothesis |
| title | DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation |
| title_full | DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation |
| title_fullStr | DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation |
| title_full_unstemmed | DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation |
| title_short | DCT-DiffPose: A Lightweight Diffusion Model With Multi-Hypothesis for 3D Human Pose Estimation |
| title_sort | dct diffpose a lightweight diffusion model with multi hypothesis for 3d human pose estimation |
| topic | 3D human pose estimation discrete cosine transform diffusion model multi-hypothesis |
| url | https://ieeexplore.ieee.org/document/10975753/ |
| work_keys_str_mv | AT linzhanzhong dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT fangnichen dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT bolunzheng dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT ruifeng dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT leizhang dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation AT jianwan dctdiffposealightweightdiffusionmodelwithmultihypothesisfor3dhumanposeestimation |