A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection

Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising res...

Full description

Saved in:
Bibliographic Details
Main Authors: Majid Joudaki, Mehdi Imani, Hamid R. Arabnia
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Technologies
Subjects:
Online Access:https://www.mdpi.com/2227-7080/13/2/53
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850231558708920320
author Majid Joudaki
Mehdi Imani
Hamid R. Arabnia
author_facet Majid Joudaki
Mehdi Imani
Hamid R. Arabnia
author_sort Majid Joudaki
collection DOAJ
description Recognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising results, they often struggle with computational inefficiencies and inadequate spatial–temporal feature extraction, hindering scalability to larger datasets or high-resolution videos. To address these limitations, we propose a novel model combining a two-dimensional convolutional restricted Boltzmann machine (2D Conv-RBM) with a long short-term memory (LSTM) network. The 2D Conv-RBM efficiently extracts spatial features such as edges, textures, and motion patterns while preserving spatial relationships and reducing parameters via weight sharing. These features are subsequently processed by the LSTM to capture temporal dependencies across frames, enabling effective recognition of both short- and long-term action patterns. Additionally, a smart frame selection mechanism minimizes frame redundancy, significantly lowering computational costs without compromising accuracy. Evaluation on the KTH, UCF Sports, and HMDB51 datasets demonstrated superior performance, achieving accuracies of 97.3%, 94.8%, and 81.5%, respectively. Compared to traditional approaches like 2D RBM and 3D CNN, our method offers notable improvements in both accuracy and computational efficiency, presenting a scalable solution for real-time applications in surveillance, video security, and sports analytics.
format Article
id doaj-art-d3b709e1bcab4971b49b2a5608ef606f
institution OA Journals
issn 2227-7080
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Technologies
spelling doaj-art-d3b709e1bcab4971b49b2a5608ef606f2025-08-20T02:03:30ZengMDPI AGTechnologies2227-70802025-02-011325310.3390/technologies13020053A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame SelectionMajid Joudaki0Mehdi Imani1Hamid R. Arabnia2Electrical and Computer Engineering, University of Kashan, Kashan 8731753153, IranDepartment of Computer and System Sciences, Stockholm University, 10691 Stockholm, SwedenSchool of Computing, University of Georgia, Athens GA 30602, USARecognizing human actions through video analysis has gained significant attention in applications like surveillance, sports analytics, and human–computer interaction. While deep learning models such as 3D convolutional neural networks (CNNs) and recurrent neural networks (RNNs) deliver promising results, they often struggle with computational inefficiencies and inadequate spatial–temporal feature extraction, hindering scalability to larger datasets or high-resolution videos. To address these limitations, we propose a novel model combining a two-dimensional convolutional restricted Boltzmann machine (2D Conv-RBM) with a long short-term memory (LSTM) network. The 2D Conv-RBM efficiently extracts spatial features such as edges, textures, and motion patterns while preserving spatial relationships and reducing parameters via weight sharing. These features are subsequently processed by the LSTM to capture temporal dependencies across frames, enabling effective recognition of both short- and long-term action patterns. Additionally, a smart frame selection mechanism minimizes frame redundancy, significantly lowering computational costs without compromising accuracy. Evaluation on the KTH, UCF Sports, and HMDB51 datasets demonstrated superior performance, achieving accuracies of 97.3%, 94.8%, and 81.5%, respectively. Compared to traditional approaches like 2D RBM and 3D CNN, our method offers notable improvements in both accuracy and computational efficiency, presenting a scalable solution for real-time applications in surveillance, video security, and sports analytics.https://www.mdpi.com/2227-7080/13/2/53action recognitionconvolutional restricted Boltzmann machinelong short-term memoryspatial–temporal feature extractionvideo processing
spellingShingle Majid Joudaki
Mehdi Imani
Hamid R. Arabnia
A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection
Technologies
action recognition
convolutional restricted Boltzmann machine
long short-term memory
spatial–temporal feature extraction
video processing
title A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection
title_full A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection
title_fullStr A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection
title_full_unstemmed A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection
title_short A New Efficient Hybrid Technique for Human Action Recognition Using 2D Conv-RBM and LSTM with Optimized Frame Selection
title_sort new efficient hybrid technique for human action recognition using 2d conv rbm and lstm with optimized frame selection
topic action recognition
convolutional restricted Boltzmann machine
long short-term memory
spatial–temporal feature extraction
video processing
url https://www.mdpi.com/2227-7080/13/2/53
work_keys_str_mv AT majidjoudaki anewefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection
AT mehdiimani anewefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection
AT hamidrarabnia anewefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection
AT majidjoudaki newefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection
AT mehdiimani newefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection
AT hamidrarabnia newefficienthybridtechniqueforhumanactionrecognitionusing2dconvrbmandlstmwithoptimizedframeselection