Learning spatio-temporal context for basketball action pose estimation with a multi-stream network
Abstract Accurate athlete pose estimation in basketball is crucial for game analysis, player training, and tactical decision-making. However, existing pose estimation methods struggle to effectively address common challenges in basketball, such as motion blur, occlusions, and complex backgrounds. To...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-08-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-14985-y |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849766259523059712 |
|---|---|
| author | Zhihao Zhang Wenyue Liu Yuan Zheng Linkang Du Lezhong Sun |
| author_facet | Zhihao Zhang Wenyue Liu Yuan Zheng Linkang Du Lezhong Sun |
| author_sort | Zhihao Zhang |
| collection | DOAJ |
| description | Abstract Accurate athlete pose estimation in basketball is crucial for game analysis, player training, and tactical decision-making. However, existing pose estimation methods struggle to effectively address common challenges in basketball, such as motion blur, occlusions, and complex backgrounds. To tackle these issues, this paper proposes a basketball action pose estimation framework, which first leverages a multi-dimensional data stream network to extract spatial, temporal, and contextual information separately. Specifically, the spatial stream branch aims to extract multi-scale features and captures the spatial pose information of players in single-frame images through feature fusion and spatial attention mechanisms. The temporal stream branch merges feature maps with adjacent frames, effectively capturing player motion information across consecutive frames. The context stream branch generates a global context feature vector that encodes the entire image, offering a holistic perspective for pose estimation. Subsequently, we designed a feature fusion module that integrates early fusion, late fusion, and hybrid fusion strategies to fully utilize multi-modal information. Finally, we introduced a stage-wise streaming training module that progressively enhances the model’s accuracy and generalization ability through three stages. Experimental results demonstrate that the proposed framework significantly improves the accuracy and robustness of basketball action pose estimation, particularly excelling in scenarios with high dynamics and complex backgrounds. |
| format | Article |
| id | doaj-art-8fbe05af2db8403da056198ed893428d |
| institution | DOAJ |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-8fbe05af2db8403da056198ed893428d2025-08-20T03:04:38ZengNature PortfolioScientific Reports2045-23222025-08-0115111910.1038/s41598-025-14985-yLearning spatio-temporal context for basketball action pose estimation with a multi-stream networkZhihao Zhang0Wenyue Liu1Yuan Zheng2Linkang Du3Lezhong Sun4Faculty of Education, Universiti Kebangsaan MalaysiaFaculty of Education, Universiti Kebangsaan MalaysiaFaculty of Marxism, Xinyang Normal UniversityXi’an Jiaotong UniversityShandong Vocational University of Foreign AffairsAbstract Accurate athlete pose estimation in basketball is crucial for game analysis, player training, and tactical decision-making. However, existing pose estimation methods struggle to effectively address common challenges in basketball, such as motion blur, occlusions, and complex backgrounds. To tackle these issues, this paper proposes a basketball action pose estimation framework, which first leverages a multi-dimensional data stream network to extract spatial, temporal, and contextual information separately. Specifically, the spatial stream branch aims to extract multi-scale features and captures the spatial pose information of players in single-frame images through feature fusion and spatial attention mechanisms. The temporal stream branch merges feature maps with adjacent frames, effectively capturing player motion information across consecutive frames. The context stream branch generates a global context feature vector that encodes the entire image, offering a holistic perspective for pose estimation. Subsequently, we designed a feature fusion module that integrates early fusion, late fusion, and hybrid fusion strategies to fully utilize multi-modal information. Finally, we introduced a stage-wise streaming training module that progressively enhances the model’s accuracy and generalization ability through three stages. Experimental results demonstrate that the proposed framework significantly improves the accuracy and robustness of basketball action pose estimation, particularly excelling in scenarios with high dynamics and complex backgrounds.https://doi.org/10.1038/s41598-025-14985-ySports pose estimationFeature fusionDeep LearningComputer vision |
| spellingShingle | Zhihao Zhang Wenyue Liu Yuan Zheng Linkang Du Lezhong Sun Learning spatio-temporal context for basketball action pose estimation with a multi-stream network Scientific Reports Sports pose estimation Feature fusion Deep Learning Computer vision |
| title | Learning spatio-temporal context for basketball action pose estimation with a multi-stream network |
| title_full | Learning spatio-temporal context for basketball action pose estimation with a multi-stream network |
| title_fullStr | Learning spatio-temporal context for basketball action pose estimation with a multi-stream network |
| title_full_unstemmed | Learning spatio-temporal context for basketball action pose estimation with a multi-stream network |
| title_short | Learning spatio-temporal context for basketball action pose estimation with a multi-stream network |
| title_sort | learning spatio temporal context for basketball action pose estimation with a multi stream network |
| topic | Sports pose estimation Feature fusion Deep Learning Computer vision |
| url | https://doi.org/10.1038/s41598-025-14985-y |
| work_keys_str_mv | AT zhihaozhang learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork AT wenyueliu learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork AT yuanzheng learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork AT linkangdu learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork AT lezhongsun learningspatiotemporalcontextforbasketballactionposeestimationwithamultistreamnetwork |