Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based Networks

Traditional approaches for video analysis often misdefine anomalies; they usually rely on single-modality input and have inadequate management of complex temporal patterns. This paper resolves these limitations by proposing a comprehensive scheme for multimodal Closed-Circuit Television (CCTV) video...

Full description

Saved in:

Bibliographic Details
Main Authors:	V. Srilakshmi, Sai Babu Veesam, Mallu Shiva Rama Krishna, Ravi Kumar Munaganuri, Dulam Devee Sivaprasad
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Anomaly detection deep learning multimodal fusion temporal context modeling unsupervised learning
Online Access:	https://ieeexplore.ieee.org/document/10876563/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Traditional approaches for video analysis often misdefine anomalies; they usually rely on single-modality input and have inadequate management of complex temporal patterns. This paper resolves these limitations by proposing a comprehensive scheme for multimodal Closed-Circuit Television (CCTV) video analysis. The utilized techniques in this paper comprise the Multimodal Deep Boltzmann Machine (MDBM), Multimodal Variational Autoencoder (MVAE) and Attention-based Fusion Networks, all of which fully utilize the learned representations. MDBM learns shared representations out of heterogeneous data sources, MVAE captures the inherent distribution of multi-modalities, while the mechanism of attention in fusion networks is done to stress important features. Finally, temporal context is modeled using long short-term memory and transformer networks, temporal convolutional networks and transformer networks with temporal encoding. Long Short-Term Memory (LSTM) can capture long-range dependencies in sequential data, while Temporal Convolutional Network (TCN) efficiently models temporal patterns using convolutional layers and Transformer Networks fathom the relative importance of temporal features against one another through self-attention, thus improving their detection accuracy for anomalies that happen over a long duration. The proposed models also offer good improvements in the performance of anomaly detection. In particular, accuracy improved by 5% using MDBM, the false positive rate reduced by 15% with MVAE, a more than 10% improvement in the F1-score with the attentive fusion network, a 20% reduction in reconstruction error with Deep Convolutional Autoencoder (DCA), detection precision improved by 12% using Adversarially Learned Inference (ALI) and a gain of 8% in Area Under the Curve (AUC) using Deep InfoMax (DIM) operations.
ISSN:	2169-3536

Design of an Improved Model for Anomaly Detection in CCTV Systems Using Multimodal Fusion and Attention-Based Networks

Similar Items