The effect of depth data and upper limb impairment on lightweight monocular RGB human pose estimation models

Abstract Background and objectives Markerless vision-based human pose estimation (HPE) is a promising avenue towards scalable data collection in rehabilitation. Deploying this technology will require self-contained systems able to process data efficiently and accurately. The aims of this work are to...

Full description

Saved in:

Bibliographic Details
Main Authors:	Gloria-Edith Boudreault-Morales, Cesar Marquez-Chin, Xilin Liu, José Zariffa
Format:	Article
Language:	English
Published:	BMC 2025-02-01
Series:	BioMedical Engineering OnLine
Subjects:	Pose estimation Depth data Motion capture Rehabilitation Stroke
Online Access:	https://doi.org/10.1186/s12938-025-01347-y
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823861747618938880
author	Gloria-Edith Boudreault-Morales Cesar Marquez-Chin Xilin Liu José Zariffa
author_facet	Gloria-Edith Boudreault-Morales Cesar Marquez-Chin Xilin Liu José Zariffa
author_sort	Gloria-Edith Boudreault-Morales
collection	DOAJ
description	Abstract Background and objectives Markerless vision-based human pose estimation (HPE) is a promising avenue towards scalable data collection in rehabilitation. Deploying this technology will require self-contained systems able to process data efficiently and accurately. The aims of this work are to (1) Determine how depth data affects lightweight monocular red–green–blue (RGB) HPE performance (accuracy and speed), to inform sensor selection and (2) Validate HPE models using data from individuals with physical impairments. Methods Two HPE models were investigated: Dite-HRNet and MobileHumanPose (capable of 2D and 3D HPE, respectively). The models were modified to include depth data as an input using three different fusion techniques: an early fusion method, a simple intermediate fusion method (using concatenation), and a complex intermediate fusion method (using specific fusion blocks, additional convolutional layers, and concatenation). All fusion techniques used RGB-D data, in contrast to the original models which only used RGB data. The models were trained, validated and tested using the CMU Panoptic and Human3.6 M data sets as well as a custom data set. The custom data set includes RGB-D and optical motion capture data of 15 uninjured and 12 post-stroke individuals, while they performed movements involving their upper limbs. HPE model performances were monitored through accuracy and computational efficiency. Evaluation metrics include Mean per Joint Position Error (MPJPE), Floating Point Operations (FLOPs) and frame rates (frames per second). Results The early fusion architecture consistently delivered the lowest MPJPE in both 2D and 3D HPE cases while achieving similar FLOPs and frame rates to its RGB counterpart. These results were consistent regardless of the data used for training and testing the HPE models. Comparisons between the uninjured and stroke groups did not reveal a significant effect (all p values > 0.36) of motor impairment on the accuracy of any model. Conclusions Including depth data using an early fusion architecture improves the accuracy–efficiency trade-off of the HPE model. HPE accuracy is not affected by the presence of physical impairments. These results suggest that using depth data with RGB data is beneficial to HPE, and that models trained with data collected from uninjured individuals can generalize to persons with physical impairments.
format	Article
id	doaj-art-a501f5492ba644938c4d651426fbb5e4
institution	Kabale University
issn	1475-925X
language	English
publishDate	2025-02-01
publisher	BMC
record_format	Article
series	BioMedical Engineering OnLine
spelling	doaj-art-a501f5492ba644938c4d651426fbb5e42025-02-09T12:47:34ZengBMCBioMedical Engineering OnLine1475-925X2025-02-0124112310.1186/s12938-025-01347-yThe effect of depth data and upper limb impairment on lightweight monocular RGB human pose estimation modelsGloria-Edith Boudreault-Morales0Cesar Marquez-Chin1Xilin Liu2José Zariffa3KITE Research Institute, Toronto Rehabilitation Institute – University Health NetworkKITE Research Institute, Toronto Rehabilitation Institute – University Health NetworkKITE Research Institute, Toronto Rehabilitation Institute – University Health NetworkKITE Research Institute, Toronto Rehabilitation Institute – University Health NetworkAbstract Background and objectives Markerless vision-based human pose estimation (HPE) is a promising avenue towards scalable data collection in rehabilitation. Deploying this technology will require self-contained systems able to process data efficiently and accurately. The aims of this work are to (1) Determine how depth data affects lightweight monocular red–green–blue (RGB) HPE performance (accuracy and speed), to inform sensor selection and (2) Validate HPE models using data from individuals with physical impairments. Methods Two HPE models were investigated: Dite-HRNet and MobileHumanPose (capable of 2D and 3D HPE, respectively). The models were modified to include depth data as an input using three different fusion techniques: an early fusion method, a simple intermediate fusion method (using concatenation), and a complex intermediate fusion method (using specific fusion blocks, additional convolutional layers, and concatenation). All fusion techniques used RGB-D data, in contrast to the original models which only used RGB data. The models were trained, validated and tested using the CMU Panoptic and Human3.6 M data sets as well as a custom data set. The custom data set includes RGB-D and optical motion capture data of 15 uninjured and 12 post-stroke individuals, while they performed movements involving their upper limbs. HPE model performances were monitored through accuracy and computational efficiency. Evaluation metrics include Mean per Joint Position Error (MPJPE), Floating Point Operations (FLOPs) and frame rates (frames per second). Results The early fusion architecture consistently delivered the lowest MPJPE in both 2D and 3D HPE cases while achieving similar FLOPs and frame rates to its RGB counterpart. These results were consistent regardless of the data used for training and testing the HPE models. Comparisons between the uninjured and stroke groups did not reveal a significant effect (all p values > 0.36) of motor impairment on the accuracy of any model. Conclusions Including depth data using an early fusion architecture improves the accuracy–efficiency trade-off of the HPE model. HPE accuracy is not affected by the presence of physical impairments. These results suggest that using depth data with RGB data is beneficial to HPE, and that models trained with data collected from uninjured individuals can generalize to persons with physical impairments.https://doi.org/10.1186/s12938-025-01347-yPose estimationDepth dataMotion captureRehabilitationStroke
spellingShingle	Gloria-Edith Boudreault-Morales Cesar Marquez-Chin Xilin Liu José Zariffa The effect of depth data and upper limb impairment on lightweight monocular RGB human pose estimation models BioMedical Engineering OnLine Pose estimation Depth data Motion capture Rehabilitation Stroke
title	The effect of depth data and upper limb impairment on lightweight monocular RGB human pose estimation models
title_full	The effect of depth data and upper limb impairment on lightweight monocular RGB human pose estimation models
title_fullStr	The effect of depth data and upper limb impairment on lightweight monocular RGB human pose estimation models
title_full_unstemmed	The effect of depth data and upper limb impairment on lightweight monocular RGB human pose estimation models
title_short	The effect of depth data and upper limb impairment on lightweight monocular RGB human pose estimation models
title_sort	effect of depth data and upper limb impairment on lightweight monocular rgb human pose estimation models
topic	Pose estimation Depth data Motion capture Rehabilitation Stroke
url	https://doi.org/10.1186/s12938-025-01347-y
work_keys_str_mv	AT gloriaedithboudreaultmorales theeffectofdepthdataandupperlimbimpairmentonlightweightmonocularrgbhumanposeestimationmodels AT cesarmarquezchin theeffectofdepthdataandupperlimbimpairmentonlightweightmonocularrgbhumanposeestimationmodels AT xilinliu theeffectofdepthdataandupperlimbimpairmentonlightweightmonocularrgbhumanposeestimationmodels AT josezariffa theeffectofdepthdataandupperlimbimpairmentonlightweightmonocularrgbhumanposeestimationmodels AT gloriaedithboudreaultmorales effectofdepthdataandupperlimbimpairmentonlightweightmonocularrgbhumanposeestimationmodels AT cesarmarquezchin effectofdepthdataandupperlimbimpairmentonlightweightmonocularrgbhumanposeestimationmodels AT xilinliu effectofdepthdataandupperlimbimpairmentonlightweightmonocularrgbhumanposeestimationmodels AT josezariffa effectofdepthdataandupperlimbimpairmentonlightweightmonocularrgbhumanposeestimationmodels

The effect of depth data and upper limb impairment on lightweight monocular RGB human pose estimation models

Similar Items