Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based Networks

Traditional approaches for video analysis often misdefine anomalies; they usually rely on single-modality input and have inadequate management of complex temporal patterns. This paper resolves these limitations by proposing a comprehensive scheme for multimodal Closed-Circuit Television (CCTV) video...

Full description

Saved in:
Bibliographic Details
Main Authors: V. Srilakshmi, Sai Babu Veesam, Mallu Shiva Rama Krishna, Ravi Kumar Munaganuri, Dulam Devee Sivaprasad
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10876563/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849775150918008832
author V. Srilakshmi
Sai Babu Veesam
Mallu Shiva Rama Krishna
Ravi Kumar Munaganuri
Dulam Devee Sivaprasad
author_facet V. Srilakshmi
Sai Babu Veesam
Mallu Shiva Rama Krishna
Ravi Kumar Munaganuri
Dulam Devee Sivaprasad
author_sort V. Srilakshmi
collection DOAJ
description Traditional approaches for video analysis often misdefine anomalies; they usually rely on single-modality input and have inadequate management of complex temporal patterns. This paper resolves these limitations by proposing a comprehensive scheme for multimodal Closed-Circuit Television (CCTV) video analysis. The utilized techniques in this paper comprise the Multimodal Deep Boltzmann Machine (MDBM), Multimodal Variational Autoencoder (MVAE) and Attention-based Fusion Networks, all of which fully utilize the learned representations. MDBM learns shared representations out of heterogeneous data sources, MVAE captures the inherent distribution of multi-modalities, while the mechanism of attention in fusion networks is done to stress important features. Finally, temporal context is modeled using long short-term memory and transformer networks, temporal convolutional networks and transformer networks with temporal encoding. Long Short-Term Memory (LSTM) can capture long-range dependencies in sequential data, while Temporal Convolutional Network (TCN) efficiently models temporal patterns using convolutional layers and Transformer Networks fathom the relative importance of temporal features against one another through self-attention, thus improving their detection accuracy for anomalies that happen over a long duration. The proposed models also offer good improvements in the performance of anomaly detection. In particular, accuracy improved by 5% using MDBM, the false positive rate reduced by 15% with MVAE, a more than 10% improvement in the F1-score with the attentive fusion network, a 20% reduction in reconstruction error with Deep Convolutional Autoencoder (DCA), detection precision improved by 12% using Adversarially Learned Inference (ALI) and a gain of 8% in Area Under the Curve (AUC) using Deep InfoMax (DIM) operations.
format Article
id doaj-art-a738371587e642d080f9bda39a4bf619
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-a738371587e642d080f9bda39a4bf6192025-08-20T03:01:31ZengIEEEIEEE Access2169-35362025-01-0113272872730910.1109/ACCESS.2025.353650110876563Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based NetworksV. Srilakshmi0https://orcid.org/0000-0002-2058-0781Sai Babu Veesam1https://orcid.org/0009-0000-5473-4681Mallu Shiva Rama Krishna2https://orcid.org/0009-0007-8950-0288Ravi Kumar Munaganuri3https://orcid.org/0000-0001-6629-2315Dulam Devee Sivaprasad4School of Computer Science and Engineering, VIT-AP University, Amaravati, IndiaSchool of Computer Science and Engineering, VIT-AP University, Amaravati, IndiaSchool of Computer Science and Engineering, VIT-AP University, Amaravati, IndiaSchool of Computer Science and Engineering, VIT-AP University, Amaravati, IndiaSchool of Computer Science and Engineering, VIT-AP University, Amaravati, IndiaTraditional approaches for video analysis often misdefine anomalies; they usually rely on single-modality input and have inadequate management of complex temporal patterns. This paper resolves these limitations by proposing a comprehensive scheme for multimodal Closed-Circuit Television (CCTV) video analysis. The utilized techniques in this paper comprise the Multimodal Deep Boltzmann Machine (MDBM), Multimodal Variational Autoencoder (MVAE) and Attention-based Fusion Networks, all of which fully utilize the learned representations. MDBM learns shared representations out of heterogeneous data sources, MVAE captures the inherent distribution of multi-modalities, while the mechanism of attention in fusion networks is done to stress important features. Finally, temporal context is modeled using long short-term memory and transformer networks, temporal convolutional networks and transformer networks with temporal encoding. Long Short-Term Memory (LSTM) can capture long-range dependencies in sequential data, while Temporal Convolutional Network (TCN) efficiently models temporal patterns using convolutional layers and Transformer Networks fathom the relative importance of temporal features against one another through self-attention, thus improving their detection accuracy for anomalies that happen over a long duration. The proposed models also offer good improvements in the performance of anomaly detection. In particular, accuracy improved by 5% using MDBM, the false positive rate reduced by 15% with MVAE, a more than 10% improvement in the F1-score with the attentive fusion network, a 20% reduction in reconstruction error with Deep Convolutional Autoencoder (DCA), detection precision improved by 12% using Adversarially Learned Inference (ALI) and a gain of 8% in Area Under the Curve (AUC) using Deep InfoMax (DIM) operations.https://ieeexplore.ieee.org/document/10876563/Anomaly detectiondeep learningmultimodal fusiontemporal context modelingunsupervised learning
spellingShingle V. Srilakshmi
Sai Babu Veesam
Mallu Shiva Rama Krishna
Ravi Kumar Munaganuri
Dulam Devee Sivaprasad
Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based Networks
IEEE Access
Anomaly detection
deep learning
multimodal fusion
temporal context modeling
unsupervised learning
title Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based Networks
title_full Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based Networks
title_fullStr Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based Networks
title_full_unstemmed Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based Networks
title_short Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based Networks
title_sort design of an improved model for anomaly detection in cctv systems using multimodal fusion and attention based networks
topic Anomaly detection
deep learning
multimodal fusion
temporal context modeling
unsupervised learning
url https://ieeexplore.ieee.org/document/10876563/
work_keys_str_mv AT vsrilakshmi designofanimprovedmodelforanomalydetectionincctvsystemsusingmultimodalfusionandattentionbasednetworks
AT saibabuveesam designofanimprovedmodelforanomalydetectionincctvsystemsusingmultimodalfusionandattentionbasednetworks
AT mallushivaramakrishna designofanimprovedmodelforanomalydetectionincctvsystemsusingmultimodalfusionandattentionbasednetworks
AT ravikumarmunaganuri designofanimprovedmodelforanomalydetectionincctvsystemsusingmultimodalfusionandattentionbasednetworks
AT dulamdeveesivaprasad designofanimprovedmodelforanomalydetectionincctvsystemsusingmultimodalfusionandattentionbasednetworks