A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network

Abstract Deep neural networks are used to accurately detect, estimate, and predict human body poses in images or videos through deep learning-based human pose estimation. However, traditional multi-person pose estimation methods face challenges due to partial occlusions and overlaps between multiple...

Full description

Saved in:
Bibliographic Details
Main Authors: Shaobin Cai, Han Xu, Wanchen Cai, Yuchang Mo, Liansuo Wei
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-00259-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850284640395329536
author Shaobin Cai
Han Xu
Wanchen Cai
Yuchang Mo
Liansuo Wei
author_facet Shaobin Cai
Han Xu
Wanchen Cai
Yuchang Mo
Liansuo Wei
author_sort Shaobin Cai
collection DOAJ
description Abstract Deep neural networks are used to accurately detect, estimate, and predict human body poses in images or videos through deep learning-based human pose estimation. However, traditional multi-person pose estimation methods face challenges due to partial occlusions and overlaps between multiple human bodies and body parts. To address these issues, we propose EE-YOLOv8, a human pose estimation network based on the YOLOv8 framework, which integrates Efficient Multi-scale Receptive Field (EMRF) and Expanded Feature Pyramid Network (EFPN). First, the EMRF module is employed to further enhance the model’s feature representation capability. Second, the EFPN optimizes cross-level information exchange and improves multi-scale data integration. Finally, Wise-IoU replaces the traditional Intersection over Union (IoU) to improve detection accuracy through precise overlap measurement between predicted and ground-truth bounding boxes. We evaluate EE-YOLOv8 on the MS COCO 2017 dataset. Compared to YOLOv8-Pose, EE-YOLOv8 achieves an AP of 89.0% at an IoU threshold of 0.5 (an improvement of 3.3%) and an AP of 65.6% over the IoU range of 0.5–0.95 (an improvement of 5.8%). Therefore, EE-YOLOv8 achieves the highest accuracy while maintaining the lowest parameter count and computational complexity among all analyzed algorithms. These results demonstrate that EE-YOLOv8 exhibits superior competitiveness compared to other mainstream methods.
format Article
id doaj-art-47c7fb1fc5a449199f3f178baff69f03
institution OA Journals
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-47c7fb1fc5a449199f3f178baff69f032025-08-20T01:47:32ZengNature PortfolioScientific Reports2045-23222025-05-0115111210.1038/s41598-025-00259-0A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid networkShaobin Cai0Han Xu1Wanchen Cai2Yuchang Mo3Liansuo Wei4College of Information Engineering, Huzhou UniversityCollege of Information Engineering, Huzhou UniversityCollege of Management, National Taiwan UniversityCollege of Mathematics, Huaqiao UniversityCollege of Information Engineering, Suqian UniversityAbstract Deep neural networks are used to accurately detect, estimate, and predict human body poses in images or videos through deep learning-based human pose estimation. However, traditional multi-person pose estimation methods face challenges due to partial occlusions and overlaps between multiple human bodies and body parts. To address these issues, we propose EE-YOLOv8, a human pose estimation network based on the YOLOv8 framework, which integrates Efficient Multi-scale Receptive Field (EMRF) and Expanded Feature Pyramid Network (EFPN). First, the EMRF module is employed to further enhance the model’s feature representation capability. Second, the EFPN optimizes cross-level information exchange and improves multi-scale data integration. Finally, Wise-IoU replaces the traditional Intersection over Union (IoU) to improve detection accuracy through precise overlap measurement between predicted and ground-truth bounding boxes. We evaluate EE-YOLOv8 on the MS COCO 2017 dataset. Compared to YOLOv8-Pose, EE-YOLOv8 achieves an AP of 89.0% at an IoU threshold of 0.5 (an improvement of 3.3%) and an AP of 65.6% over the IoU range of 0.5–0.95 (an improvement of 5.8%). Therefore, EE-YOLOv8 achieves the highest accuracy while maintaining the lowest parameter count and computational complexity among all analyzed algorithms. These results demonstrate that EE-YOLOv8 exhibits superior competitiveness compared to other mainstream methods.https://doi.org/10.1038/s41598-025-00259-0Pose estimationAttention mechanismFeature pyramid network
spellingShingle Shaobin Cai
Han Xu
Wanchen Cai
Yuchang Mo
Liansuo Wei
A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network
Scientific Reports
Pose estimation
Attention mechanism
Feature pyramid network
title A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network
title_full A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network
title_fullStr A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network
title_full_unstemmed A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network
title_short A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network
title_sort human pose estimation network based on yolov8 framework with efficient multi scale receptive field and expanded feature pyramid network
topic Pose estimation
Attention mechanism
Feature pyramid network
url https://doi.org/10.1038/s41598-025-00259-0
work_keys_str_mv AT shaobincai ahumanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork
AT hanxu ahumanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork
AT wanchencai ahumanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork
AT yuchangmo ahumanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork
AT liansuowei ahumanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork
AT shaobincai humanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork
AT hanxu humanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork
AT wanchencai humanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork
AT yuchangmo humanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork
AT liansuowei humanposeestimationnetworkbasedonyolov8frameworkwithefficientmultiscalereceptivefieldandexpandedfeaturepyramidnetwork