Learning spatio-temporal context for basketball action pose estimation with a multi-stream network

Abstract Accurate athlete pose estimation in basketball is crucial for game analysis, player training, and tactical decision-making. However, existing pose estimation methods struggle to effectively address common challenges in basketball, such as motion blur, occlusions, and complex backgrounds. To...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhihao Zhang, Wenyue Liu, Yuan Zheng, Linkang Du, Lezhong Sun
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-14985-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849766259523059712
author Zhihao Zhang
Wenyue Liu
Yuan Zheng
Linkang Du
Lezhong Sun
author_facet Zhihao Zhang
Wenyue Liu
Yuan Zheng
Linkang Du
Lezhong Sun
author_sort Zhihao Zhang
collection DOAJ
description Abstract Accurate athlete pose estimation in basketball is crucial for game analysis, player training, and tactical decision-making. However, existing pose estimation methods struggle to effectively address common challenges in basketball, such as motion blur, occlusions, and complex backgrounds. To tackle these issues, this paper proposes a basketball action pose estimation framework, which first leverages a multi-dimensional data stream network to extract spatial, temporal, and contextual information separately. Specifically, the spatial stream branch aims to extract multi-scale features and captures the spatial pose information of players in single-frame images through feature fusion and spatial attention mechanisms. The temporal stream branch merges feature maps with adjacent frames, effectively capturing player motion information across consecutive frames. The context stream branch generates a global context feature vector that encodes the entire image, offering a holistic perspective for pose estimation. Subsequently, we designed a feature fusion module that integrates early fusion, late fusion, and hybrid fusion strategies to fully utilize multi-modal information. Finally, we introduced a stage-wise streaming training module that progressively enhances the model’s accuracy and generalization ability through three stages. Experimental results demonstrate that the proposed framework significantly improves the accuracy and robustness of basketball action pose estimation, particularly excelling in scenarios with high dynamics and complex backgrounds.
format Article
id doaj-art-8fbe05af2db8403da056198ed893428d
institution DOAJ
issn 2045-2322
language English
publishDate 2025-08-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-8fbe05af2db8403da056198ed893428d2025-08-20T03:04:38ZengNature PortfolioScientific Reports2045-23222025-08-0115111910.1038/s41598-025-14985-yLearning spatio-temporal context for basketball action pose estimation with a multi-stream networkZhihao Zhang0Wenyue Liu1Yuan Zheng2Linkang Du3Lezhong Sun4Faculty of Education, Universiti Kebangsaan MalaysiaFaculty of Education, Universiti Kebangsaan MalaysiaFaculty of Marxism, Xinyang Normal UniversityXi’an Jiaotong UniversityShandong Vocational University of Foreign AffairsAbstract Accurate athlete pose estimation in basketball is crucial for game analysis, player training, and tactical decision-making. However, existing pose estimation methods struggle to effectively address common challenges in basketball, such as motion blur, occlusions, and complex backgrounds. To tackle these issues, this paper proposes a basketball action pose estimation framework, which first leverages a multi-dimensional data stream network to extract spatial, temporal, and contextual information separately. Specifically, the spatial stream branch aims to extract multi-scale features and captures the spatial pose information of players in single-frame images through feature fusion and spatial attention mechanisms. The temporal stream branch merges feature maps with adjacent frames, effectively capturing player motion information across consecutive frames. The context stream branch generates a global context feature vector that encodes the entire image, offering a holistic perspective for pose estimation. Subsequently, we designed a feature fusion module that integrates early fusion, late fusion, and hybrid fusion strategies to fully utilize multi-modal information. Finally, we introduced a stage-wise streaming training module that progressively enhances the model’s accuracy and generalization ability through three stages. Experimental results demonstrate that the proposed framework significantly improves the accuracy and robustness of basketball action pose estimation, particularly excelling in scenarios with high dynamics and complex backgrounds.https://doi.org/10.1038/s41598-025-14985-ySports pose estimationFeature fusionDeep LearningComputer vision
spellingShingle Zhihao Zhang
Wenyue Liu
Yuan Zheng
Linkang Du
Lezhong Sun
Learning spatio-temporal context for basketball action pose estimation with a multi-stream network
Scientific Reports
Sports pose estimation
Feature fusion
Deep Learning
Computer vision
title Learning spatio-temporal context for basketball action pose estimation with a multi-stream network
title_full Learning spatio-temporal context for basketball action pose estimation with a multi-stream network
title_fullStr Learning spatio-temporal context for basketball action pose estimation with a multi-stream network
title_full_unstemmed Learning spatio-temporal context for basketball action pose estimation with a multi-stream network
title_short Learning spatio-temporal context for basketball action pose estimation with a multi-stream network
title_sort learning spatio temporal context for basketball action pose estimation with a multi stream network
topic Sports pose estimation
Feature fusion
Deep Learning
Computer vision
url https://doi.org/10.1038/s41598-025-14985-y
work_keys_str_mv AT zhihaozhang learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork
AT wenyueliu learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork
AT yuanzheng learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork
AT linkangdu learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork
AT lezhongsun learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork