Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2023, 6(12); doi: 10.25236/AJCIS.2023.061207.

Video Sequence Anomaly Detection with Multi-Layer Memory-Augmented Autoencoder

Author(s)

Minxiang Long, Pengwei Zhang, Jingxia Chen, Wentao Lin, Yiyi Gao

Corresponding Author:
Jingxia Chen
Affiliation(s)

College of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an, 710021, China

Abstract

In order to enhance the model's ability to learn normal pattern features in the video anomaly detection task, this paper proposes an end-to-end video anomaly detection method that combines reconstruction and prediction. The method consists of two modules: (1) multi-layer memory-enhanced auto-encoder module: reconstructs RGB frames using a multi-layer auto-encoder with skip connection to compensate for the information loss due to memory; (2) conditional variational auto-encoder module: the reconstructed RGB frames from the previous step are taken as inputs, and predicts future frames using the current optical flow as a condition to capture the correlation between the optical flow and the video frames. Comparative experiments are conducted on Avenue, Ped2, and SHTech datasets, and the experimental results show that the hybrid model achieves relatively strong anomaly detection capability.

Keywords

video anomaly detection, Self-encoder, variational inference, memory-augmented

Cite This Paper

Minxiang Long, Pengwei Zhang, Jingxia Chen, Wentao Lin, Yiyi Gao. Video Sequence Anomaly Detection with Multi-Layer Memory-Augmented Autoencoder. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 12: 59-69. https://doi.org/10.25236/AJCIS.2023.061207.

References

[1] Chandola, V., A. Banerjee, and V. Kumar, Anomaly detection: A survey. ACM computing surveys (CSUR), 2009. 41(3): p. 1-58.

[2] Liu W, Luo W, Lian D, et al. Future frame prediction for anomaly detection–a new baseline[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6536-6545.

[3] Fan Y, Wen G, Li D, et al. Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder[J]. Computer Vision and Image Understanding, 2020, 195: 102920.

[4] Gong D, Liu L, Le V, et al. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1705-1714.

[5] Lu Y, Kumar K M, shahabeddin Nabavi S, et al. Future frame prediction using convolutional vrnn for anomaly detection[C]//2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2019: 1-8

[6] Lv H, Chen C, Cui Z, et al. Learning normal dynamics in videos with meta prototype network[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 15425-15434.

[7] Le, V.-T. and Y.-G. Kim, Attention-based residual autoencoder for video anomaly detection. Applied Intelligence, 2023. 53(3): p. 3240-3254.

[8] Cao, C., Y. Lu, and Y. Zhang, Context recovery and knowledge retrieval: A novel two-stream framework for video anomaly detection. arXiv preprint arXiv:2209.02899, 2022.

[9] Graves, A., et al., Hybrid computing using a neural network with dynamic external memory. Nature, 2016. 538(7626): p. 471-476.

[10] Mirza, M. and S. Osindero, Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.

[11] Kingma, D.P. and M. Welling, Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

[12] Sohn, K., H. Lee, and X. Yan, Learning structured output representation using deep conditional generative models. Advances in neural information processing systems, 2015. 28.

[13] Esser P, Sutter E, Ommer B. A variational u-net for conditional appearance and shape generation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8857-8866.

[14] Liu Z, Nie Y, Long C, et al. A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 13588-13597.

[15] Yu G, Wang S, Cai Z, et al. Cloze test helps: Effective video anomaly detection via learning to complete video events[C]//Proceedings of the 28th ACM International Conference on Multimedia. 2020: 583-591.

[16] Nguyen T N, Meunier J. Anomaly detection in video sequence with appearance-motion correspondence[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1273-1283.

[17] Needham, T., A visual explanation of Jensen's inequality. The American mathematical monthly, 1993. 100(8): p. 768-771.

[18] Sabokrou, M., et al., Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Computer Vision and Image Understanding, 2018. 172: p. 88-97.

[19] Lu C, Shi J, Jia J. Abnormal event detection at 150 fps in matlab[C]//Proceedings of the IEEE international conference on computer vision. 2013: 2720-2727.

[20] Luo W, Liu W, Gao S. A revisit of sparse coding based anomaly detection in stacked rnn framework[C]//Proceedings of the IEEE international conference on computer vision. 2017: 341-349.

[21] Hasan M, Choi J, Neumann J, et al. Learning temporal regularity in video sequences[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 733-742.

[22] Lai Y, Liu R, Han Y. Video anomaly detection via predictive autoencoder with gradient-based attention[C]//2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2020: 1-6.

[23] Park H, Noh J, Ham B. Learning memory-guided normality for anomaly detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 14372-14381.

[24] Cai R, Zhang H, Liu W, et al. Appearance-motion memory consistency network for video anomaly detection[C]//Proceedings of the AAAI conference on artificial intelligence. 2021, 35(2): 938-946.

[25] Singh A, Jones M J, Learned-Miller E G. EVAL: Explainable Video Anomaly Localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 18717-18726.