Training-Free VLM-Based Pseudo Label Generation for Video Anomaly Detection
Video anomaly detection in weakly supervised settings remains a challenging task due to the absence of frame-level annotations. To address this, we propose a novel training-free pseudo-label generation module (TFPLG) for Weakly Supervised Video Anomaly Detection (WSVAD), which leverages the vision-l...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11015429/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Video anomaly detection in weakly supervised settings remains a challenging task due to the absence of frame-level annotations. To address this, we propose a novel training-free pseudo-label generation module (TFPLG) for Weakly Supervised Video Anomaly Detection (WSVAD), which leverages the vision-language alignment of the pre-trained CLIP model to generate pseudo-labels without requiring any training. Unlike prior methods that depend on learned classifiers, our approach employs a threshold-guided similarity-matching mechanism to produce both fine-grained and coarse-grained pseudo-labels. The framework adopts a triple-branch architecture: the first branch generates pseudo-labels, while the second and third perform coarse-grained binary and fine-grained categorical classification. Temporal modeling is enhanced through the integration of transformers and Graph Convolutional Networks (GCNs) to capture both short- and long-range dependencies. Experiments on UCF-Crime and XD-Violence demonstrate the effectiveness of our approach, achieving a 1.4% average precision gain on XD-Violence compared to leading pseudo-labeling methods, and a 1.6% improvement in anomaly AUC on UCF-Crime over the best existing approaches. In zero-shot testing on the new MSAD dataset, our framework achieves a 3.24% AUC improvement, highlighting its robustness and adaptability. The source code is publicly available at: <uri>https://github.com/MoshiraAbdalla/TFPLG_VAD</uri> |
|---|---|
| ISSN: | 2169-3536 |