YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences

Automated action recognition has become essential in the surveillance, healthcare, and multimedia retrieval industries owing to the rapid proliferation of video data. This paper introduces YOLO-Act, a novel spatiotemporal action detection model that enhances the object detection capabilities of YOLO...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nada Alzahrani, Ouiem Bchir, Mohamed Maher Ben Ismail
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Sensors
Subjects:	action detection keyframe extraction fusion technique spatiotemporal information you only look once (YOLO)
Online Access:	https://www.mdpi.com/1424-8220/25/10/3013
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849327240346599424
author	Nada Alzahrani Ouiem Bchir Mohamed Maher Ben Ismail
author_facet	Nada Alzahrani Ouiem Bchir Mohamed Maher Ben Ismail
author_sort	Nada Alzahrani
collection	DOAJ
description	Automated action recognition has become essential in the surveillance, healthcare, and multimedia retrieval industries owing to the rapid proliferation of video data. This paper introduces YOLO-Act, a novel spatiotemporal action detection model that enhances the object detection capabilities of YOLOv8 to efficiently manage complex action dynamics within video sequences. YOLO-Act achieves precise and efficient action recognition by integrating keyframe extraction, action tracking, and class fusion. The model depicts essential temporal dynamics without the computational overhead of continuous frame processing by leveraging the adaptive selection of three keyframes representing the beginning, middle, and end of the actions. Compared with state-of-the-art approaches such as the Lagrangian Action Recognition Transformer (LART), YOLO-Act exhibits superior performance with a mean average precision (mAP) of 73.28 in experiments conducted on the AVA dataset, resulting in a gain of +28.18 mAP. Furthermore, YOLO-Act achieves this higher accuracy with significantly lower FLOPs, demonstrating its efficiency in computational resource utilization. The results highlight the advantages of incorporating precise tracking, effective spatial detection, and temporal consistency to address the challenges associated with video-based action detection.
format	Article
id	doaj-art-4de1d8d2e2af42c3a1233c8259109176
institution	Kabale University
issn	1424-8220
language	English
publishDate	2025-05-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-4de1d8d2e2af42c3a1233c82591091762025-08-20T03:47:57ZengMDPI AGSensors1424-82202025-05-012510301310.3390/s25103013YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame SequencesNada Alzahrani0Ouiem Bchir1Mohamed Maher Ben Ismail2Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi ArabiaComputer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi ArabiaComputer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi ArabiaAutomated action recognition has become essential in the surveillance, healthcare, and multimedia retrieval industries owing to the rapid proliferation of video data. This paper introduces YOLO-Act, a novel spatiotemporal action detection model that enhances the object detection capabilities of YOLOv8 to efficiently manage complex action dynamics within video sequences. YOLO-Act achieves precise and efficient action recognition by integrating keyframe extraction, action tracking, and class fusion. The model depicts essential temporal dynamics without the computational overhead of continuous frame processing by leveraging the adaptive selection of three keyframes representing the beginning, middle, and end of the actions. Compared with state-of-the-art approaches such as the Lagrangian Action Recognition Transformer (LART), YOLO-Act exhibits superior performance with a mean average precision (mAP) of 73.28 in experiments conducted on the AVA dataset, resulting in a gain of +28.18 mAP. Furthermore, YOLO-Act achieves this higher accuracy with significantly lower FLOPs, demonstrating its efficiency in computational resource utilization. The results highlight the advantages of incorporating precise tracking, effective spatial detection, and temporal consistency to address the challenges associated with video-based action detection.https://www.mdpi.com/1424-8220/25/10/3013action detectionkeyframe extractionfusion techniquespatiotemporal informationyou only look once (YOLO)
spellingShingle	Nada Alzahrani Ouiem Bchir Mohamed Maher Ben Ismail YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences Sensors action detection keyframe extraction fusion technique spatiotemporal information you only look once (YOLO)
title	YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences
title_full	YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences
title_fullStr	YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences
title_full_unstemmed	YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences
title_short	YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences
title_sort	yolo act unified spatiotemporal detection of human actions across multi frame sequences
topic	action detection keyframe extraction fusion technique spatiotemporal information you only look once (YOLO)
url	https://www.mdpi.com/1424-8220/25/10/3013
work_keys_str_mv	AT nadaalzahrani yoloactunifiedspatiotemporaldetectionofhumanactionsacrossmultiframesequences AT ouiembchir yoloactunifiedspatiotemporaldetectionofhumanactionsacrossmultiframesequences AT mohamedmaherbenismail yoloactunifiedspatiotemporaldetectionofhumanactionsacrossmultiframesequences

YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences

Similar Items