Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training

Surveillance cameras have been recently introduced in various locations to maintain public safety. However, it is tedious for security personnel to continue observing videos obtained by surveillance cameras because abnormal events rarely occur. Consequently, this study aims to develop an unsupervise...

Full description

Saved in:
Bibliographic Details
Main Authors: Shimpei Kobayashi, Akiyoshi Hizukuri, Ryohei Nakayama
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10942323/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849737014961766400
author Shimpei Kobayashi
Akiyoshi Hizukuri
Ryohei Nakayama
author_facet Shimpei Kobayashi
Akiyoshi Hizukuri
Ryohei Nakayama
author_sort Shimpei Kobayashi
collection DOAJ
description Surveillance cameras have been recently introduced in various locations to maintain public safety. However, it is tedious for security personnel to continue observing videos obtained by surveillance cameras because abnormal events rarely occur. Consequently, this study aims to develop an unsupervised anomaly detection method for surveillance videos based on the prediction of future frame and optical flow magnitude using adversarial networks. Our database consists of two public datasets for anomaly detection: UCSD Pedestrian 2 (UCSD Ped2) and ShanghaiTech Campus. The proposed network consists of a generator and a U-Net-based discriminator. The generator has an encoder based on a video vision transformer and two decoders that independently predict the future frame and optical flow magnitude using feature maps extracted from the encoder at different resolutions. The U-Net-based discriminator identifies at the pixel level whether the image is a “real” or “false” frame predicted by the generator. The generator and U-Net-based discriminator in the proposed network are trained through adversarial training. Finally, anomaly detection is performed based on element-wise subtraction between the predicted future frame and the optical flow magnitude and their corresponding ground truth frames. The areas under the receiver operating characteristic curve (AUC) with the proposed network were 97.5% for UCSD Ped2 and 85.8% for ShanghaiTech Campus. The AUC for the proposed network was also greater than those of well-known anomaly detection methods.
format Article
id doaj-art-479a0de409bb492f97b2905058678f2c
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-479a0de409bb492f97b2905058678f2c2025-08-20T03:07:05ZengIEEEIEEE Access2169-35362025-01-0113561695617910.1109/ACCESS.2025.355481310942323Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial TrainingShimpei Kobayashi0https://orcid.org/0009-0003-0925-0532Akiyoshi Hizukuri1https://orcid.org/0000-0001-5963-3900Ryohei Nakayama2https://orcid.org/0000-0003-0181-7496Graduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, JapanGraduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, JapanGraduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, JapanSurveillance cameras have been recently introduced in various locations to maintain public safety. However, it is tedious for security personnel to continue observing videos obtained by surveillance cameras because abnormal events rarely occur. Consequently, this study aims to develop an unsupervised anomaly detection method for surveillance videos based on the prediction of future frame and optical flow magnitude using adversarial networks. Our database consists of two public datasets for anomaly detection: UCSD Pedestrian 2 (UCSD Ped2) and ShanghaiTech Campus. The proposed network consists of a generator and a U-Net-based discriminator. The generator has an encoder based on a video vision transformer and two decoders that independently predict the future frame and optical flow magnitude using feature maps extracted from the encoder at different resolutions. The U-Net-based discriminator identifies at the pixel level whether the image is a “real” or “false” frame predicted by the generator. The generator and U-Net-based discriminator in the proposed network are trained through adversarial training. Finally, anomaly detection is performed based on element-wise subtraction between the predicted future frame and the optical flow magnitude and their corresponding ground truth frames. The areas under the receiver operating characteristic curve (AUC) with the proposed network were 97.5% for UCSD Ped2 and 85.8% for ShanghaiTech Campus. The AUC for the proposed network was also greater than those of well-known anomaly detection methods.https://ieeexplore.ieee.org/document/10942323/Generative adversarial networkssurveillance videosU-Net-based discriminatorunsupervised anomaly detectionvideo vision transformer
spellingShingle Shimpei Kobayashi
Akiyoshi Hizukuri
Ryohei Nakayama
Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
IEEE Access
Generative adversarial networks
surveillance videos
U-Net-based discriminator
unsupervised anomaly detection
video vision transformer
title Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_full Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_fullStr Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_full_unstemmed Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_short Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_sort unsupervised video anomaly detection using video vision transformer and adversarial training
topic Generative adversarial networks
surveillance videos
U-Net-based discriminator
unsupervised anomaly detection
video vision transformer
url https://ieeexplore.ieee.org/document/10942323/
work_keys_str_mv AT shimpeikobayashi unsupervisedvideoanomalydetectionusingvideovisiontransformerandadversarialtraining
AT akiyoshihizukuri unsupervisedvideoanomalydetectionusingvideovisiontransformerandadversarialtraining
AT ryoheinakayama unsupervisedvideoanomalydetectionusingvideovisiontransformerandadversarialtraining