Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network

In this study, we propose a novel approach for human pose estimation (HPE) in occluded scenes by progressively fusing features extracted from RGB-D images, which contain RGB and depth images. Conventional bottom-up human pose estimation models that rely solely on RGB inputs often produce erroneous s...

Full description

Saved in:
Bibliographic Details
Main Authors: Jae-hyuk Yoon, Soon-kak Kwon
Format: Article
Language:English
Published: MDPI AG 2025-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/15/8746
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849407548850962432
author Jae-hyuk Yoon
Soon-kak Kwon
author_facet Jae-hyuk Yoon
Soon-kak Kwon
author_sort Jae-hyuk Yoon
collection DOAJ
description In this study, we propose a novel approach for human pose estimation (HPE) in occluded scenes by progressively fusing features extracted from RGB-D images, which contain RGB and depth images. Conventional bottom-up human pose estimation models that rely solely on RGB inputs often produce erroneous skeletons when parts of a person’s body are obscured by another individual, because they struggle to accurately infer body connectivity due to the lack of 3D topological information. To address this limitation, we modify the traditional OpenPose that is a bottom-up HPE model to take a depth image as an additional input, thereby providing explicit 3D spatial cues. Each input modality is processed by a dedicated feature extractor. Each input modality is processed by a dedicated feature extractor. In addition to the two existing modules for each stage—joint connectivity and joint confidence map estimations for the color image—we integrate a new module for estimating joint confidence maps for the depth image into the initial few stages. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next, ensuring that 3D topological information from the depth image is effectively utilized for both joint localization and body part association. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next to ensure that 3D topological information is effectively utilized for estimating both joint localization and their connectivity. The experimental results on the NTU 120+ RGB-D Dataset verify that our proposed approach achieves a 13.3% improvement in average recall compared to the original OpenPose model. The proposed method can enhance the performance of the bottom-up HPE models for the occlusion scenes.
format Article
id doaj-art-edd3b72ae01c47ddb8a17cd80e9ae969
institution Kabale University
issn 2076-3417
language English
publishDate 2025-08-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-edd3b72ae01c47ddb8a17cd80e9ae9692025-08-20T03:36:02ZengMDPI AGApplied Sciences2076-34172025-08-011515874610.3390/app15158746Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural NetworkJae-hyuk Yoon0Soon-kak Kwon1Department of Computer Software Engineering, Dong-eui University, Busan 47340, Republic of KoreaDepartment of Computer Software Engineering, Dong-eui University, Busan 47340, Republic of KoreaIn this study, we propose a novel approach for human pose estimation (HPE) in occluded scenes by progressively fusing features extracted from RGB-D images, which contain RGB and depth images. Conventional bottom-up human pose estimation models that rely solely on RGB inputs often produce erroneous skeletons when parts of a person’s body are obscured by another individual, because they struggle to accurately infer body connectivity due to the lack of 3D topological information. To address this limitation, we modify the traditional OpenPose that is a bottom-up HPE model to take a depth image as an additional input, thereby providing explicit 3D spatial cues. Each input modality is processed by a dedicated feature extractor. Each input modality is processed by a dedicated feature extractor. In addition to the two existing modules for each stage—joint connectivity and joint confidence map estimations for the color image—we integrate a new module for estimating joint confidence maps for the depth image into the initial few stages. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next, ensuring that 3D topological information from the depth image is effectively utilized for both joint localization and body part association. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next to ensure that 3D topological information is effectively utilized for estimating both joint localization and their connectivity. The experimental results on the NTU 120+ RGB-D Dataset verify that our proposed approach achieves a 13.3% improvement in average recall compared to the original OpenPose model. The proposed method can enhance the performance of the bottom-up HPE models for the occlusion scenes.https://www.mdpi.com/2076-3417/15/15/8746deep learningcomputer visionhuman pose estimationRGB-D image
spellingShingle Jae-hyuk Yoon
Soon-kak Kwon
Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network
Applied Sciences
deep learning
computer vision
human pose estimation
RGB-D image
title Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network
title_full Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network
title_fullStr Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network
title_full_unstemmed Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network
title_short Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network
title_sort robust human pose estimation method for body to body occlusion using rgb d fusion neural network
topic deep learning
computer vision
human pose estimation
RGB-D image
url https://www.mdpi.com/2076-3417/15/15/8746
work_keys_str_mv AT jaehyukyoon robusthumanposeestimationmethodforbodytobodyocclusionusingrgbdfusionneuralnetwork
AT soonkakkwon robusthumanposeestimationmethodforbodytobodyocclusionusingrgbdfusionneuralnetwork