Training-Free VLM-Based Pseudo Label Generation for Video Anomaly Detection

Video anomaly detection in weakly supervised settings remains a challenging task due to the absence of frame-level annotations. To address this, we propose a novel training-free pseudo-label generation module (TFPLG) for Weakly Supervised Video Anomaly Detection (WSVAD), which leverages the vision-l...

Full description

Saved in:
Bibliographic Details
Main Authors: Moshira Abdalla, Sajid Javed
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11015429/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Video anomaly detection in weakly supervised settings remains a challenging task due to the absence of frame-level annotations. To address this, we propose a novel training-free pseudo-label generation module (TFPLG) for Weakly Supervised Video Anomaly Detection (WSVAD), which leverages the vision-language alignment of the pre-trained CLIP model to generate pseudo-labels without requiring any training. Unlike prior methods that depend on learned classifiers, our approach employs a threshold-guided similarity-matching mechanism to produce both fine-grained and coarse-grained pseudo-labels. The framework adopts a triple-branch architecture: the first branch generates pseudo-labels, while the second and third perform coarse-grained binary and fine-grained categorical classification. Temporal modeling is enhanced through the integration of transformers and Graph Convolutional Networks (GCNs) to capture both short- and long-range dependencies. Experiments on UCF-Crime and XD-Violence demonstrate the effectiveness of our approach, achieving a 1.4% average precision gain on XD-Violence compared to leading pseudo-labeling methods, and a 1.6% improvement in anomaly AUC on UCF-Crime over the best existing approaches. In zero-shot testing on the new MSAD dataset, our framework achieves a 3.24% AUC improvement, highlighting its robustness and adaptability. The source code is publicly available at: <uri>https://github.com/MoshiraAbdalla/TFPLG_VAD</uri>
ISSN:2169-3536