Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training
Surveillance cameras have been recently introduced in various locations to maintain public safety. However, it is tedious for security personnel to continue observing videos obtained by surveillance cameras because abnormal events rarely occur. Consequently, this study aims to develop an unsupervise...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10942323/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849737014961766400 |
|---|---|
| author | Shimpei Kobayashi Akiyoshi Hizukuri Ryohei Nakayama |
| author_facet | Shimpei Kobayashi Akiyoshi Hizukuri Ryohei Nakayama |
| author_sort | Shimpei Kobayashi |
| collection | DOAJ |
| description | Surveillance cameras have been recently introduced in various locations to maintain public safety. However, it is tedious for security personnel to continue observing videos obtained by surveillance cameras because abnormal events rarely occur. Consequently, this study aims to develop an unsupervised anomaly detection method for surveillance videos based on the prediction of future frame and optical flow magnitude using adversarial networks. Our database consists of two public datasets for anomaly detection: UCSD Pedestrian 2 (UCSD Ped2) and ShanghaiTech Campus. The proposed network consists of a generator and a U-Net-based discriminator. The generator has an encoder based on a video vision transformer and two decoders that independently predict the future frame and optical flow magnitude using feature maps extracted from the encoder at different resolutions. The U-Net-based discriminator identifies at the pixel level whether the image is a “real” or “false” frame predicted by the generator. The generator and U-Net-based discriminator in the proposed network are trained through adversarial training. Finally, anomaly detection is performed based on element-wise subtraction between the predicted future frame and the optical flow magnitude and their corresponding ground truth frames. The areas under the receiver operating characteristic curve (AUC) with the proposed network were 97.5% for UCSD Ped2 and 85.8% for ShanghaiTech Campus. The AUC for the proposed network was also greater than those of well-known anomaly detection methods. |
| format | Article |
| id | doaj-art-479a0de409bb492f97b2905058678f2c |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-479a0de409bb492f97b2905058678f2c2025-08-20T03:07:05ZengIEEEIEEE Access2169-35362025-01-0113561695617910.1109/ACCESS.2025.355481310942323Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial TrainingShimpei Kobayashi0https://orcid.org/0009-0003-0925-0532Akiyoshi Hizukuri1https://orcid.org/0000-0001-5963-3900Ryohei Nakayama2https://orcid.org/0000-0003-0181-7496Graduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, JapanGraduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, JapanGraduate School of Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, JapanSurveillance cameras have been recently introduced in various locations to maintain public safety. However, it is tedious for security personnel to continue observing videos obtained by surveillance cameras because abnormal events rarely occur. Consequently, this study aims to develop an unsupervised anomaly detection method for surveillance videos based on the prediction of future frame and optical flow magnitude using adversarial networks. Our database consists of two public datasets for anomaly detection: UCSD Pedestrian 2 (UCSD Ped2) and ShanghaiTech Campus. The proposed network consists of a generator and a U-Net-based discriminator. The generator has an encoder based on a video vision transformer and two decoders that independently predict the future frame and optical flow magnitude using feature maps extracted from the encoder at different resolutions. The U-Net-based discriminator identifies at the pixel level whether the image is a “real” or “false” frame predicted by the generator. The generator and U-Net-based discriminator in the proposed network are trained through adversarial training. Finally, anomaly detection is performed based on element-wise subtraction between the predicted future frame and the optical flow magnitude and their corresponding ground truth frames. The areas under the receiver operating characteristic curve (AUC) with the proposed network were 97.5% for UCSD Ped2 and 85.8% for ShanghaiTech Campus. The AUC for the proposed network was also greater than those of well-known anomaly detection methods.https://ieeexplore.ieee.org/document/10942323/Generative adversarial networkssurveillance videosU-Net-based discriminatorunsupervised anomaly detectionvideo vision transformer |
| spellingShingle | Shimpei Kobayashi Akiyoshi Hizukuri Ryohei Nakayama Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training IEEE Access Generative adversarial networks surveillance videos U-Net-based discriminator unsupervised anomaly detection video vision transformer |
| title | Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training |
| title_full | Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training |
| title_fullStr | Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training |
| title_full_unstemmed | Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training |
| title_short | Unsupervised Video Anomaly Detection Using Video Vision Transformer and Adversarial Training |
| title_sort | unsupervised video anomaly detection using video vision transformer and adversarial training |
| topic | Generative adversarial networks surveillance videos U-Net-based discriminator unsupervised anomaly detection video vision transformer |
| url | https://ieeexplore.ieee.org/document/10942323/ |
| work_keys_str_mv | AT shimpeikobayashi unsupervisedvideoanomalydetectionusingvideovisiontransformerandadversarialtraining AT akiyoshihizukuri unsupervisedvideoanomalydetectionusingvideovisiontransformerandadversarialtraining AT ryoheinakayama unsupervisedvideoanomalydetectionusingvideovisiontransformerandadversarialtraining |