ROM-Pose: restoring occluded mask image for 2D human pose estimation

Human pose estimation (HPE) is a field focused on estimating human poses by detecting key points in images. HPE includes methods like top-down and bottom-up approaches. The top-down approach uses a two-stage process, first locating and then detecting key points on humans with bounding boxes, whereas...

Full description

Saved in:
Bibliographic Details
Main Authors: Yunju Lee, Jihie Kim
Format: Article
Language:English
Published: PeerJ Inc. 2025-05-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2843.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human pose estimation (HPE) is a field focused on estimating human poses by detecting key points in images. HPE includes methods like top-down and bottom-up approaches. The top-down approach uses a two-stage process, first locating and then detecting key points on humans with bounding boxes, whereas the bottom-up approach directly detects individual key points and integrates them to estimate the overall pose. In this article, we address the problem of bounding box detection inaccuracies in certain situations using the top-down method. The detected bounding boxes, which serve as input for the model, impact the accuracy of pose estimation. Occlusions occur when a part of the target’s body is obscured by a person or object and hinder the model’s ability to detect complete bounding boxes. Consequently, the model produces bounding boxes that do not recognize occluded parts, resulting in their exclusion from the input used by the HPE model. To mitigate this issue, we introduce the Restoring Occluded Mask Image for 2D Human Pose Estimation (ROM-Pose), comprising a restoration model and an HPE model. The restoration model is designed to delineate the boundary between the target’s grayscale mask (occluded image) and the blocker’s grayscale mask (occludee image) using the specially created Whole Common Objects in Context (COCO) dataset. Upon identifying the boundary, the restoration model restores the occluded image. This restored image is subsequently overlaid onto the RGB image for use in the HPE model. By integrating occluded parts’ information into the input, the bounding box includes these areas during detection, thus enhancing the HPE model’s ability to recognize them. ROM-Pose achieved a 1.6% improvement in average precision (AP) compared to the baseline.
ISSN:2376-5992