A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection
Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising res...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Technologies |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7080/13/2/53 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850231558708920320 |
|---|---|
| author | Majid Joudaki Mehdi Imani Hamid R. Arabnia |
| author_facet | Majid Joudaki Mehdi Imani Hamid R. Arabnia |
| author_sort | Majid Joudaki |
| collection | DOAJ |
| description | Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising results, they often struggle with computational inefficiencies and inadequate spatial–temporal feature extraction, hindering scalability to larger datasets or high-resolution videos. To address these limitations, we propose a novel model combining a two-dimensional convolutional restricted Boltzmann machine (2D Conv-RBM) with a long short-term memory (LSTM) network. The 2D Conv-RBM efficiently extracts spatial features such as edges, textures, and motion patterns while preserving spatial relationships and reducing parameters via weight sharing. These features are subsequently processed by the LSTM to capture temporal dependencies across frames, enabling effective recognition of both short- and long-term action patterns. Additionally, a smart frame selection mechanism minimizes frame redundancy, significantly lowering computational costs without compromising accuracy. Evaluation on the KTH, UCF Sports, and HMDB51 datasets demonstrated superior performance, achieving accuracies of 97.3%, 94.8%, and 81.5%, respectively. Compared to traditional approaches like 2D RBM and 3D CNN, our method offers notable improvements in both accuracy and computational efficiency, presenting a scalable solution for real-time applications in surveillance, video security, and sports analytics. |
| format | Article |
| id | doaj-art-d3b709e1bcab4971b49b2a5608ef606f |
| institution | OA Journals |
| issn | 2227-7080 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Technologies |
| spelling | doaj-art-d3b709e1bcab4971b49b2a5608ef606f2025-08-20T02:03:30ZengMDPI AGTechnologies2227-70802025-02-011325310.3390/technologies13020053A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame SelectionMajid Joudaki0Mehdi Imani1Hamid R. Arabnia2Electrical and Computer Engineering, University of Kashan, Kashan 8731753153, IranDepartment of Computer and System Sciences, Stockholm University, 10691 Stockholm, SwedenSchool of Computing, University of Georgia, Athens GA 30602, USARecognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising results, they often struggle with computational inefficiencies and inadequate spatial–temporal feature extraction, hindering scalability to larger datasets or high-resolution videos. To address these limitations, we propose a novel model combining a two-dimensional convolutional restricted Boltzmann machine (2D Conv-RBM) with a long short-term memory (LSTM) network. The 2D Conv-RBM efficiently extracts spatial features such as edges, textures, and motion patterns while preserving spatial relationships and reducing parameters via weight sharing. These features are subsequently processed by the LSTM to capture temporal dependencies across frames, enabling effective recognition of both short- and long-term action patterns. Additionally, a smart frame selection mechanism minimizes frame redundancy, significantly lowering computational costs without compromising accuracy. Evaluation on the KTH, UCF Sports, and HMDB51 datasets demonstrated superior performance, achieving accuracies of 97.3%, 94.8%, and 81.5%, respectively. Compared to traditional approaches like 2D RBM and 3D CNN, our method offers notable improvements in both accuracy and computational efficiency, presenting a scalable solution for real-time applications in surveillance, video security, and sports analytics.https://www.mdpi.com/2227-7080/13/2/53action recognitionconvolutional restricted Boltzmann machinelong short-term memoryspatial–temporal feature extractionvideo processing |
| spellingShingle | Majid Joudaki Mehdi Imani Hamid R. Arabnia A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection Technologies action recognition convolutional restricted Boltzmann machine long short-term memory spatial–temporal feature extraction video processing |
| title | A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection |
| title_full | A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection |
| title_fullStr | A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection |
| title_full_unstemmed | A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection |
| title_short | A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection |
| title_sort | new efficient hybrid technique for human action recognition using 2d conv rbm and lstm with optimized frame selection |
| topic | action recognition convolutional restricted Boltzmann machine long short-term memory spatial–temporal feature extraction video processing |
| url | https://www.mdpi.com/2227-7080/13/2/53 |
| work_keys_str_mv | AT majidjoudaki anewefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection AT mehdiimani anewefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection AT hamidrarabnia anewefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection AT majidjoudaki newefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection AT mehdiimani newefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection AT hamidrarabnia newefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection |