Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
Abstract Human action recognition research has become increasingly sophisticated in recent years because it is applied in many different applications like surveillance, sports analysis, robotics, etc. Many approaches have been put up to solve the problem of human action recognition, but a number of...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer Nature
2025-04-01
|
| Series: | Human-Centric Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44230-025-00095-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849311141420859392 |
|---|---|
| author | R. Divya Rani C. J. Prabhakar |
| author_facet | R. Divya Rani C. J. Prabhakar |
| author_sort | R. Divya Rani |
| collection | DOAJ |
| description | Abstract Human action recognition research has become increasingly sophisticated in recent years because it is applied in many different applications like surveillance, sports analysis, robotics, etc. Many approaches have been put up to solve the problem of human action recognition, but a number of challenges, such as occlusions, background clutter, environmental variations, camera motion, and intra- and inter-class similarities, still need to be resolved. Effectively collecting and combining the spatial–temporal information is essential for describing a video in action recognition. In this research paper, we address the problem of human action recognition by combining handcrafted spatio-temporal features with deep spatial features. This paper proposes a novel method for recognizing human actions in video by combining handcrafted spatio-temporal texture features extracted by our proposed feature descriptor, Volume Local Derivative Gradient Ternary Patterns (VLDGTP), and deep spatial features extracted from a modified Inception-v4 network. To reduce the dimension and to get equal sized feature vectors, we employed the PCA dimensionality reduction technique for both types of features. Then the dimensionality-reduced feature vectors are combined and passed to the SVM classifier for action recognition. Extensive experimentation is carried out on three benchmark datasets: KTH, UCF-101, and HMDB-51 datasets. Our proposed HAR method (VLDGTP + DEEP_FEATURES) outperforms existing HAR methods on the KTH, UCF-101, and HMDB-51 datasets, achieving an accuracy of 98.33% for KTH, 97.10% for UCF-101, and 87.50% for HMDB-51 dataset, demonstrating its superior performance in action recognition tasks. |
| format | Article |
| id | doaj-art-fccdaff879254cd180aab78131e8a91a |
| institution | Kabale University |
| issn | 2667-1336 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Springer Nature |
| record_format | Article |
| series | Human-Centric Intelligent Systems |
| spelling | doaj-art-fccdaff879254cd180aab78131e8a91a2025-08-20T03:53:31ZengSpringer NatureHuman-Centric Intelligent Systems2667-13362025-04-015112315010.1007/s44230-025-00095-5Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action RecognitionR. Divya Rani0C. J. Prabhakar1Department of P.G. Studies and Research in Computer Science, Kuvempu UniversityDepartment of P.G. Studies and Research in Computer Science, Kuvempu UniversityAbstract Human action recognition research has become increasingly sophisticated in recent years because it is applied in many different applications like surveillance, sports analysis, robotics, etc. Many approaches have been put up to solve the problem of human action recognition, but a number of challenges, such as occlusions, background clutter, environmental variations, camera motion, and intra- and inter-class similarities, still need to be resolved. Effectively collecting and combining the spatial–temporal information is essential for describing a video in action recognition. In this research paper, we address the problem of human action recognition by combining handcrafted spatio-temporal features with deep spatial features. This paper proposes a novel method for recognizing human actions in video by combining handcrafted spatio-temporal texture features extracted by our proposed feature descriptor, Volume Local Derivative Gradient Ternary Patterns (VLDGTP), and deep spatial features extracted from a modified Inception-v4 network. To reduce the dimension and to get equal sized feature vectors, we employed the PCA dimensionality reduction technique for both types of features. Then the dimensionality-reduced feature vectors are combined and passed to the SVM classifier for action recognition. Extensive experimentation is carried out on three benchmark datasets: KTH, UCF-101, and HMDB-51 datasets. Our proposed HAR method (VLDGTP + DEEP_FEATURES) outperforms existing HAR methods on the KTH, UCF-101, and HMDB-51 datasets, achieving an accuracy of 98.33% for KTH, 97.10% for UCF-101, and 87.50% for HMDB-51 dataset, demonstrating its superior performance in action recognition tasks.https://doi.org/10.1007/s44230-025-00095-5HARAction recognitionHuman action recognitionSpatio-temporal featuresVolume Local Derivative Gradient Ternary PatternVLDGTP |
| spellingShingle | R. Divya Rani C. J. Prabhakar Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition Human-Centric Intelligent Systems HAR Action recognition Human action recognition Spatio-temporal features Volume Local Derivative Gradient Ternary Pattern VLDGTP |
| title | Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition |
| title_full | Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition |
| title_fullStr | Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition |
| title_full_unstemmed | Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition |
| title_short | Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition |
| title_sort | combining handcrafted spatio temporal and deep spatial features for effective human action recognition |
| topic | HAR Action recognition Human action recognition Spatio-temporal features Volume Local Derivative Gradient Ternary Pattern VLDGTP |
| url | https://doi.org/10.1007/s44230-025-00095-5 |
| work_keys_str_mv | AT rdivyarani combininghandcraftedspatiotemporalanddeepspatialfeaturesforeffectivehumanactionrecognition AT cjprabhakar combininghandcraftedspatiotemporalanddeepspatialfeaturesforeffectivehumanactionrecognition |