Comparative Analysis of Fine-Tuning I3D and SlowFast Networks for Action Recognition in Surveillance Videos

Human Action Recognition is considered to be a critical problem and it is always a challenging issue in computer vision applications, especially video surveillance applications. State-of-the-art classifiers introduced to solve the problem are computationally expensive to train and require very large...

Full description

Saved in:
Bibliographic Details
Main Authors: T. Gopalakrishnan, Naynika Wason, Raguru Jaya Krishna, Vamshi Krishna B, N. Krishnaraj
Format: Article
Language:English
Published: MDPI AG 2024-01-01
Series:Engineering Proceedings
Subjects:
Online Access:https://www.mdpi.com/2673-4591/59/1/203
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849341826374303744
author T. Gopalakrishnan
Naynika Wason
Raguru Jaya Krishna
Vamshi Krishna B
N. Krishnaraj
author_facet T. Gopalakrishnan
Naynika Wason
Raguru Jaya Krishna
Vamshi Krishna B
N. Krishnaraj
author_sort T. Gopalakrishnan
collection DOAJ
description Human Action Recognition is considered to be a critical problem and it is always a challenging issue in computer vision applications, especially video surveillance applications. State-of-the-art classifiers introduced to solve the problem are computationally expensive to train and require very large amounts of data. In this paper, we solve the problems of low data and resource availability in surveillance datasets by employing transfer learning and fine-tuning the Inflated 3D CNN model and the SlowFast Network model to automatically extract features from surveillance videos in the SPHAR dataset for classification into respective action classes. This approach works well to process the spatio-temporal nature of videos. Fine-tuning is carried out in the networks by replacing the last classification (dense) layer as per the available number of classes in the constructed new dataset. We ultimately compare the performance of both fine-tuned networks by taking accuracy as the metric, and find that the I3D model performs better for our use-case.
format Article
id doaj-art-7957dfc3050f472394bbbdf2f2c24f7b
institution Kabale University
issn 2673-4591
language English
publishDate 2024-01-01
publisher MDPI AG
record_format Article
series Engineering Proceedings
spelling doaj-art-7957dfc3050f472394bbbdf2f2c24f7b2025-08-20T03:43:33ZengMDPI AGEngineering Proceedings2673-45912024-01-0159120310.3390/engproc2023059203Comparative Analysis of Fine-Tuning I3D and SlowFast Networks for Action Recognition in Surveillance VideosT. Gopalakrishnan0Naynika Wason1Raguru Jaya Krishna2Vamshi Krishna B3N. Krishnaraj4Department of Information Technology, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal 576104, Karnataka, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, IndiaDepartment of Computer Science and Engineering, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal 576104, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal 576104, Karnataka, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, IndiaHuman Action Recognition is considered to be a critical problem and it is always a challenging issue in computer vision applications, especially video surveillance applications. State-of-the-art classifiers introduced to solve the problem are computationally expensive to train and require very large amounts of data. In this paper, we solve the problems of low data and resource availability in surveillance datasets by employing transfer learning and fine-tuning the Inflated 3D CNN model and the SlowFast Network model to automatically extract features from surveillance videos in the SPHAR dataset for classification into respective action classes. This approach works well to process the spatio-temporal nature of videos. Fine-tuning is carried out in the networks by replacing the last classification (dense) layer as per the available number of classes in the constructed new dataset. We ultimately compare the performance of both fine-tuned networks by taking accuracy as the metric, and find that the I3D model performs better for our use-case.https://www.mdpi.com/2673-4591/59/1/203human action recognitionfine-tuningdeep learningsurveillanceconvolutional neural networkSPHAR
spellingShingle T. Gopalakrishnan
Naynika Wason
Raguru Jaya Krishna
Vamshi Krishna B
N. Krishnaraj
Comparative Analysis of Fine-Tuning I3D and SlowFast Networks for Action Recognition in Surveillance Videos
Engineering Proceedings
human action recognition
fine-tuning
deep learning
surveillance
convolutional neural network
SPHAR
title Comparative Analysis of Fine-Tuning I3D and SlowFast Networks for Action Recognition in Surveillance Videos
title_full Comparative Analysis of Fine-Tuning I3D and SlowFast Networks for Action Recognition in Surveillance Videos
title_fullStr Comparative Analysis of Fine-Tuning I3D and SlowFast Networks for Action Recognition in Surveillance Videos
title_full_unstemmed Comparative Analysis of Fine-Tuning I3D and SlowFast Networks for Action Recognition in Surveillance Videos
title_short Comparative Analysis of Fine-Tuning I3D and SlowFast Networks for Action Recognition in Surveillance Videos
title_sort comparative analysis of fine tuning i3d and slowfast networks for action recognition in surveillance videos
topic human action recognition
fine-tuning
deep learning
surveillance
convolutional neural network
SPHAR
url https://www.mdpi.com/2673-4591/59/1/203
work_keys_str_mv AT tgopalakrishnan comparativeanalysisoffinetuningi3dandslowfastnetworksforactionrecognitioninsurveillancevideos
AT naynikawason comparativeanalysisoffinetuningi3dandslowfastnetworksforactionrecognitioninsurveillancevideos
AT ragurujayakrishna comparativeanalysisoffinetuningi3dandslowfastnetworksforactionrecognitioninsurveillancevideos
AT vamshikrishnab comparativeanalysisoffinetuningi3dandslowfastnetworksforactionrecognitioninsurveillancevideos
AT nkrishnaraj comparativeanalysisoffinetuningi3dandslowfastnetworksforactionrecognitioninsurveillancevideos