Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition

Abstract Human action recognition research has become increasingly sophisticated in recent years because it is applied in many different applications like surveillance, sports analysis, robotics, etc. Many approaches have been put up to solve the problem of human action recognition, but a number of...

Full description

Saved in:
Bibliographic Details
Main Authors: R. Divya Rani, C. J. Prabhakar
Format: Article
Language:English
Published: Springer Nature 2025-04-01
Series:Human-Centric Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s44230-025-00095-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849311141420859392
author R. Divya Rani
C. J. Prabhakar
author_facet R. Divya Rani
C. J. Prabhakar
author_sort R. Divya Rani
collection DOAJ
description Abstract Human action recognition research has become increasingly sophisticated in recent years because it is applied in many different applications like surveillance, sports analysis, robotics, etc. Many approaches have been put up to solve the problem of human action recognition, but a number of challenges, such as occlusions, background clutter, environmental variations, camera motion, and intra- and inter-class similarities, still need to be resolved. Effectively collecting and combining the spatial–temporal information is essential for describing a video in action recognition. In this research paper, we address the problem of human action recognition by combining handcrafted spatio-temporal features with deep spatial features. This paper proposes a novel method for recognizing human actions in video by combining handcrafted spatio-temporal texture features extracted by our proposed feature descriptor, Volume Local Derivative Gradient Ternary Patterns (VLDGTP), and deep spatial features extracted from a modified Inception-v4 network. To reduce the dimension and to get equal sized feature vectors, we employed the PCA dimensionality reduction technique for both types of features. Then the dimensionality-reduced feature vectors are combined and passed to the SVM classifier for action recognition. Extensive experimentation is carried out on three benchmark datasets: KTH, UCF-101, and HMDB-51 datasets. Our proposed HAR method (VLDGTP + DEEP_FEATURES) outperforms existing HAR methods on the KTH, UCF-101, and HMDB-51 datasets, achieving an accuracy of 98.33% for KTH, 97.10% for UCF-101, and 87.50% for HMDB-51 dataset, demonstrating its superior performance in action recognition tasks.
format Article
id doaj-art-fccdaff879254cd180aab78131e8a91a
institution Kabale University
issn 2667-1336
language English
publishDate 2025-04-01
publisher Springer Nature
record_format Article
series Human-Centric Intelligent Systems
spelling doaj-art-fccdaff879254cd180aab78131e8a91a2025-08-20T03:53:31ZengSpringer NatureHuman-Centric Intelligent Systems2667-13362025-04-015112315010.1007/s44230-025-00095-5Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action RecognitionR. Divya Rani0C. J. Prabhakar1Department of P.G. Studies and Research in Computer Science, Kuvempu UniversityDepartment of P.G. Studies and Research in Computer Science, Kuvempu UniversityAbstract Human action recognition research has become increasingly sophisticated in recent years because it is applied in many different applications like surveillance, sports analysis, robotics, etc. Many approaches have been put up to solve the problem of human action recognition, but a number of challenges, such as occlusions, background clutter, environmental variations, camera motion, and intra- and inter-class similarities, still need to be resolved. Effectively collecting and combining the spatial–temporal information is essential for describing a video in action recognition. In this research paper, we address the problem of human action recognition by combining handcrafted spatio-temporal features with deep spatial features. This paper proposes a novel method for recognizing human actions in video by combining handcrafted spatio-temporal texture features extracted by our proposed feature descriptor, Volume Local Derivative Gradient Ternary Patterns (VLDGTP), and deep spatial features extracted from a modified Inception-v4 network. To reduce the dimension and to get equal sized feature vectors, we employed the PCA dimensionality reduction technique for both types of features. Then the dimensionality-reduced feature vectors are combined and passed to the SVM classifier for action recognition. Extensive experimentation is carried out on three benchmark datasets: KTH, UCF-101, and HMDB-51 datasets. Our proposed HAR method (VLDGTP + DEEP_FEATURES) outperforms existing HAR methods on the KTH, UCF-101, and HMDB-51 datasets, achieving an accuracy of 98.33% for KTH, 97.10% for UCF-101, and 87.50% for HMDB-51 dataset, demonstrating its superior performance in action recognition tasks.https://doi.org/10.1007/s44230-025-00095-5HARAction recognitionHuman action recognitionSpatio-temporal featuresVolume Local Derivative Gradient Ternary PatternVLDGTP
spellingShingle R. Divya Rani
C. J. Prabhakar
Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
Human-Centric Intelligent Systems
HAR
Action recognition
Human action recognition
Spatio-temporal features
Volume Local Derivative Gradient Ternary Pattern
VLDGTP
title Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
title_full Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
title_fullStr Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
title_full_unstemmed Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
title_short Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
title_sort combining handcrafted spatio temporal and deep spatial features for effective human action recognition
topic HAR
Action recognition
Human action recognition
Spatio-temporal features
Volume Local Derivative Gradient Ternary Pattern
VLDGTP
url https://doi.org/10.1007/s44230-025-00095-5
work_keys_str_mv AT rdivyarani combininghandcraftedspatiotemporalanddeepspatialfeaturesforeffectivehumanactionrecognition
AT cjprabhakar combininghandcraftedspatiotemporalanddeepspatialfeaturesforeffectivehumanactionrecognition