Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training

Surveillance cameras have been recently introduced in various locations to maintain public safety. However, it is tedious for security personnel to continue observing videos obtained by surveillance cameras because abnormal events rarely occur. Consequently, this study aims to develop an unsupervise...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shimpei Kobayashi, Akiyoshi Hizukuri, Ryohei Nakayama
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Generative adversarial networks surveillance videos U-Net-based discriminator unsupervised anomaly detection video vision transformer
Online Access:	https://ieeexplore.ieee.org/document/10942323/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849737014961766400
author	Shimpei Kobayashi Akiyoshi Hizukuri Ryohei Nakayama
author_facet	Shimpei Kobayashi Akiyoshi Hizukuri Ryohei Nakayama
author_sort	Shimpei Kobayashi
collection	DOAJ
description	Surveillance cameras have been recently introduced in various locations to maintain public safety. However, it is tedious for security personnel to continue observing videos obtained by surveillance cameras because abnormal events rarely occur. Consequently, this study aims to develop an unsupervised anomaly detection method for surveillance videos based on the prediction of future frame and optical flow magnitude using adversarial networks. Our database consists of two public datasets for anomaly detection: UCSD Pedestrian 2 (UCSD Ped2) and ShanghaiTech Campus. The proposed network consists of a generator and a U-Net-based discriminator. The generator has an encoder based on a video vision transformer and two decoders that independently predict the future frame and optical flow magnitude using feature maps extracted from the encoder at different resolutions. The U-Net-based discriminator identifies at the pixel level whether the image is a “real” or “false” frame predicted by the generator. The generator and U-Net-based discriminator in the proposed network are trained through adversarial training. Finally, anomaly detection is performed based on element-wise subtraction between the predicted future frame and the optical flow magnitude and their corresponding ground truth frames. The areas under the receiver operating characteristic curve (AUC) with the proposed network were 97.5% for UCSD Ped2 and 85.8% for ShanghaiTech Campus. The AUC for the proposed network was also greater than those of well-known anomaly detection methods.
format	Article
id	doaj-art-479a0de409bb492f97b2905058678f2c
institution	DOAJ
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-479a0de409bb492f97b2905058678f2c2025-08-20T03:07:05ZengIEEEIEEE Access2169-35362025-01-0113561695617910.1109/ACCESS.2025.355481310942323Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial TrainingShimpei Kobayashi0https://orcid.org/0009-0003-0925-0532Akiyoshi Hizukuri1https://orcid.org/0000-0001-5963-3900Ryohei Nakayama2https://orcid.org/0000-0003-0181-7496Graduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, JapanGraduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, JapanGraduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, JapanSurveillance cameras have been recently introduced in various locations to maintain public safety. However, it is tedious for security personnel to continue observing videos obtained by surveillance cameras because abnormal events rarely occur. Consequently, this study aims to develop an unsupervised anomaly detection method for surveillance videos based on the prediction of future frame and optical flow magnitude using adversarial networks. Our database consists of two public datasets for anomaly detection: UCSD Pedestrian 2 (UCSD Ped2) and ShanghaiTech Campus. The proposed network consists of a generator and a U-Net-based discriminator. The generator has an encoder based on a video vision transformer and two decoders that independently predict the future frame and optical flow magnitude using feature maps extracted from the encoder at different resolutions. The U-Net-based discriminator identifies at the pixel level whether the image is a “real” or “false” frame predicted by the generator. The generator and U-Net-based discriminator in the proposed network are trained through adversarial training. Finally, anomaly detection is performed based on element-wise subtraction between the predicted future frame and the optical flow magnitude and their corresponding ground truth frames. The areas under the receiver operating characteristic curve (AUC) with the proposed network were 97.5% for UCSD Ped2 and 85.8% for ShanghaiTech Campus. The AUC for the proposed network was also greater than those of well-known anomaly detection methods.https://ieeexplore.ieee.org/document/10942323/Generative adversarial networkssurveillance videosU-Net-based discriminatorunsupervised anomaly detectionvideo vision transformer
spellingShingle	Shimpei Kobayashi Akiyoshi Hizukuri Ryohei Nakayama Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training IEEE Access Generative adversarial networks surveillance videos U-Net-based discriminator unsupervised anomaly detection video vision transformer
title	Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_full	Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_fullStr	Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_full_unstemmed	Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_short	Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
title_sort	unsupervised video anomaly detection using video vision transformer and adversarial training
topic	Generative adversarial networks surveillance videos U-Net-based discriminator unsupervised anomaly detection video vision transformer
url	https://ieeexplore.ieee.org/document/10942323/
work_keys_str_mv	AT shimpeikobayashi unsupervisedvideoanomalydetectionusingvideovisiontransformerandadversarialtraining AT akiyoshihizukuri unsupervisedvideoanomalydetectionusingvideovisiontransformerandadversarialtraining AT ryoheinakayama unsupervisedvideoanomalydetectionusingvideovisiontransformerandadversarialtraining

Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training

Similar Items