Sports video temporal action detection technology based on an improved MSST algorithm

Sports videos contain a large number of irrelevant backgrounds and static frames, which affect the efficiency and accuracy of temporal action detection. To optimize sports video data processing and temporal action detection, an improved multi-level spatiotemporal transformer network model is propose...

Full description

Saved in:
Bibliographic Details
Main Authors: Lai Lixin, Fang Yu
Format: Article
Language:English
Published: De Gruyter 2025-07-01
Series:Nonlinear Engineering
Subjects:
Online Access:https://doi.org/10.1515/nleng-2025-0143
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849721579446992896
author Lai Lixin
Fang Yu
author_facet Lai Lixin
Fang Yu
author_sort Lai Lixin
collection DOAJ
description Sports videos contain a large number of irrelevant backgrounds and static frames, which affect the efficiency and accuracy of temporal action detection. To optimize sports video data processing and temporal action detection, an improved multi-level spatiotemporal transformer network model is proposed. The model first optimizes the initial feature extraction of videos through an unsupervised video data preprocessing model based on deep residual networks. Subsequently, multi-scale features are generated through feature pyramid networks. The global spatiotemporal dependencies of actions are captured by a spatiotemporal encoder. The frame-level self-attention module further extracts keyframes and highlights temporal features, thereby improving detection accuracy. The accuracy of the proposed model was 0.6 at the beginning. After 300 iterations, the accuracy was 0.85. After 500 iterations, the highest accuracy was close to 0.9. The mAP of the improved model on the dataset reached 90.5%, which was higher than the 78.2% of the base model. The recall rate was 92.0%, the precision was 89.5%, and the calculation time was 220 ms. Meanwhile, the model shows balanced performance in detecting movements of different types of sports, especially in recognizing complex movements such as gymnastics and diving. This model effectively improves the efficiency and accuracy of time action detection through the collaborative action of multiple modules, demonstrating good applicability and robustness.
format Article
id doaj-art-d709bbf4ccc64c7088d852a1b8fc403f
institution DOAJ
issn 2192-8029
language English
publishDate 2025-07-01
publisher De Gruyter
record_format Article
series Nonlinear Engineering
spelling doaj-art-d709bbf4ccc64c7088d852a1b8fc403f2025-08-20T03:11:37ZengDe GruyterNonlinear Engineering2192-80292025-07-0114111233310.1515/nleng-2025-0143Sports video temporal action detection technology based on an improved MSST algorithmLai Lixin0Fang Yu1School of Physical Education and Sports, Shenzhen Institute of Information Technology, Shenzhen, 518172, ChinaSchool of Marxism, Shenzhen Institute of Information Technology, Shenzhen, 518172, ChinaSports videos contain a large number of irrelevant backgrounds and static frames, which affect the efficiency and accuracy of temporal action detection. To optimize sports video data processing and temporal action detection, an improved multi-level spatiotemporal transformer network model is proposed. The model first optimizes the initial feature extraction of videos through an unsupervised video data preprocessing model based on deep residual networks. Subsequently, multi-scale features are generated through feature pyramid networks. The global spatiotemporal dependencies of actions are captured by a spatiotemporal encoder. The frame-level self-attention module further extracts keyframes and highlights temporal features, thereby improving detection accuracy. The accuracy of the proposed model was 0.6 at the beginning. After 300 iterations, the accuracy was 0.85. After 500 iterations, the highest accuracy was close to 0.9. The mAP of the improved model on the dataset reached 90.5%, which was higher than the 78.2% of the base model. The recall rate was 92.0%, the precision was 89.5%, and the calculation time was 220 ms. Meanwhile, the model shows balanced performance in detecting movements of different types of sports, especially in recognizing complex movements such as gymnastics and diving. This model effectively improves the efficiency and accuracy of time action detection through the collaborative action of multiple modules, demonstrating good applicability and robustness.https://doi.org/10.1515/nleng-2025-0143sports videostemporal action detectionmultiple time scalesfeature pyramid networkspatiotemporal transformer
spellingShingle Lai Lixin
Fang Yu
Sports video temporal action detection technology based on an improved MSST algorithm
Nonlinear Engineering
sports videos
temporal action detection
multiple time scales
feature pyramid network
spatiotemporal transformer
title Sports video temporal action detection technology based on an improved MSST algorithm
title_full Sports video temporal action detection technology based on an improved MSST algorithm
title_fullStr Sports video temporal action detection technology based on an improved MSST algorithm
title_full_unstemmed Sports video temporal action detection technology based on an improved MSST algorithm
title_short Sports video temporal action detection technology based on an improved MSST algorithm
title_sort sports video temporal action detection technology based on an improved msst algorithm
topic sports videos
temporal action detection
multiple time scales
feature pyramid network
spatiotemporal transformer
url https://doi.org/10.1515/nleng-2025-0143
work_keys_str_mv AT lailixin sportsvideotemporalactiondetectiontechnologybasedonanimprovedmsstalgorithm
AT fangyu sportsvideotemporalactiondetectiontechnologybasedonanimprovedmsstalgorithm