Academic Journal of Computing & Information Science, 2024, 7(10); doi: 10.25236/AJCIS.2024.071005.
Xiangtian Li1, Xiaobo Wang2, Zhen Qi3, Han Cao4, Zhaoyang Zhang5, Ao Xiang6
1University of California San Diego, Electrical and Computer Engineering, San Diego, USA
2Georgia Institute of Technology, Computer Science, Atlanta, USA
3Northeastern University, Electrical and Computer Engineering, Boston, USA
4University of California San Diego, Computer Science, San Diego, USA
5University of California San Diego, Computational Science, San Diego, USA
6Northern Arizona University, Information Security and Assurance, Arizona, USA
Dynamic texture synthesis aims to generate sequences that are visually similar to a reference video texture and exhibit specific stationary properties in time. In this paper, we introduce a spatiotemporal generative adversarial network (DTSGAN) that can learn from a single dynamic texture by capturing its motion and content distribution. With the pipeline of DTSGAN, a new video sequence is generated from the coarsest scale to the finest one. To avoid mode collapse, we propose a novel strategy for data updates that helps improve the diversity of generated results. Qualitative and quantitative experiments show that our model is able to generate high quality dynamic textures and natural motion.
Spatiotemporal generative adversarial network, DTSGAN, deep learning, computer vision
Xiangtian Li, Xiaobo Wang, Zhen Qi, Han Cao, Zhaoyang Zhang, Ao Xiang. DTSGAN: Learning Dynamic Textures via Spatiotemporal Generative Adversarial Network. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 10: 31-40. https://doi.org/10.25236/AJCIS.2024.071005.
[1] Kwatra V, Schödl A, Essa I, et al. Graphcut textures: Image and video synthesis using graph cuts[J]. Acm transactions on graphics (tog), 2003, 22(3): 277-286.
[2] Lizarraga-Morales R A, Guo Y, Zhao G, et al. Local spatiotemporal features for dynamic texture synthesis[J]. EURASIP Journal on Image and Video Processing, 2014, 2014: 1-15.
[3] Li W, Li H, Gong A, et al. An intelligent electronic lock for remote-control system based on the internet of things[C]//journal of physics: conference series. IOP Publishing, 2018, 1069(1): 012134.
[4] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[5] Yuan L, Wen F, Liu C, et al. Synthesizing dynamic texture with closed-loop linear dynamic system[C]//Computer Vision-ECCV 2004: 8th European Conference on Computer Vision, Prague, Czech Republic, May 11-14, 2004. Proceedings, Part II 8. Springer Berlin Heidelberg, 2004: 603-616.
[6] Funke C M, Gatys L A, Ecker A S, et al. Synthesising dynamic textures using convolutional neural networks[J]. arXiv preprint arXiv:1702.07006, 2017.
[7] Tesfaldet M, Brubaker M A, Derpanis K G. Two-stream convolutional networks for dynamic texture synthesis[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6703-6712.
[8] Zhu Z, You X, Yu S, et al. Dynamic texture modeling and synthesis using multi-kernel Gaussian process dynamic model[J]. Signal Processing, 2016, 124: 63-71.
[9] Vondrick C, Pirsiavash H, Torralba A. Generating videos with scene dynamics[J]. Advances in neural information processing systems, 2016, 29.
[10] Qi Z, Ma D, Xu J, et al. Improved YOLOv5 Based on Attention Mechanism and FasterNet for Foreign Object Detection on Railway and Airway tracks[J]. arXiv preprint arXiv:2403.08499, 2024.
[11] Xiang A, Huang B, Guo X, et al. A neural matrix decomposition recommender system model based on the multimodal large language model[J]. arXiv preprint arXiv:2407.08942, 2024.
[12] Ma D, Wang M, Xiang A, et al. Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment[J]. arXiv preprint arXiv:2404.12634, 2024.
[13] Wang Z M, Li M H, Xia G S. Conditional generative ConvNets for exemplar-based texture synthesis[J]. IEEE Transactions on Image Processing, 2021, 30: 2461-2475.
[14] Xiang A, Qi Z, Wang H, et al. A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product[J]. arXiv preprint arXiv:2403.08511, 2024.
[15] Soatto S, Doretto G, Wu Y N. Dynamic textures[C]//Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001. IEEE, 2001, 2: 439-446.
[16] Wang Z, Chen Y, Wang F, et al. Improved Unet model for brain tumor image segmentation based on ASPP-coordinate attention mechanism[J]. arXiv preprint arXiv:2409.08588, 2024.
[17] Dai S, Li K, Luo Z, et al. AI-based NLP section discusses the application and effect of bag-of-words models and TF-IDF in NLP tasks[J]. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023, 2024, 5(1): 13-21.
[18] Xie J, Gao R, Zheng Z, et al. Learning dynamic generator model by alternating back-propagation through time[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 5498-5507.
[19] Wu Z. MPGAAN: Effective and Efficient Heterogeneous Information Network Classification[J]. Journal of Computer Science and Technology Studies, 2024, 6(4): 08-16.
[20] Hu Y, Cao H, Yang Z, et al. Improving text-image matching with adversarial learning and circle loss for multi-modal steganography[C]//International Workshop on Digital Watermarking. Cham: Springer International Publishing, 2020: 41-52.
[21] Shocher A, Bagon S, Isola P, et al. Ingan: Capturing and remapping the" dna" of a natural image[J]. arXiv preprint arXiv:1812.00231, 2018.
[22] Shaham T R, Dekel T, Michaeli T. Singan: Learning a generative model from a single natural image[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 4570-4580.
[23] Zhao Q, Hao Y, Li X. Stock Price Prediction Based on Hybrid CNN-LSTM Model[J]. 2024.
[24] Lin J, Pang Y, Xia Y, et al. Tuigan: Learning versatile image-to-image translation with two unpaired images[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. Springer International Publishing, 2020: 18-35.
[25] Zabari Y V E H N, Hoshen Y. Deep single image manipulation[J]. arXiv preprint arXiv:2007.01289, 2020.
[26] Hong B, Zhao P, Liu J, et al. The application of artificial intelligence technology in assembly techniques within the industrial sector[J]. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023, 2024, 5(1): 1-12.
[27] Hu Y, Yang Z, Cao H, et al. Multi-modal steganography based on semantic relevancy[C]//International Workshop on Digital Watermarking. Cham: Springer International Publishing, 2020: 3-14.
[28] Wu Z. Deep Learning with Improved Metaheuristic Optimization for Traffic Flow Prediction[J]. Journal of Computer Science and Technology Studies, 2024, 6(4): 47-53.
[29] Xue T, Wu J, Bouman K, et al. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks[J]. Advances in neural information processing systems, 2016, 29.
[30] Wang Z, Yan H, Wei C, et al. Research on autonomous driving decision-making strategies based deep reinforcement learning[J]. arXiv preprint arXiv:2408.03084, 2024.
[31] Saito M, Matsumoto E, Saito S. Temporal generative adversarial nets with singular value clipping[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2830-2839.
[32] Gao H, Wang H, Feng Z, et al. A novel texture extraction method for the sedimentary structures’ classification of petroleum imaging logging[C]//Pattern Recognition: 7th Chinese Conference, CCPR 2016, Chengdu, China, November 5-7, 2016, Proceedings, Part II 7. Springer Singapore, 2016: 161-172.
[33] Tulyakov S, Liu M Y, Yang X, et al. Mocogan: Decomposing motion and content for video generation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1526-1535.
[34] Brock A. Large Scale GAN Training for High Fidelity Natural Image Synthesis[J]. arXiv preprint arXiv:1809.11096, 2018.
[35] Clark A, Donahue J, Simonyan K. Adversarial video generation on complex datasets[J]. arXiv preprint arXiv:1907.06571, 2019.
[36] Ioffe S. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.
[37] Tan C, Wang C, Lin Z, et al. Editable neural radiance fields convert 2D to 3D furniture texture[J]. International Journal of Engineering and Management Research, 2024, 14(3): 62-65.
[38] Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans[J]. Advances in neural information processing systems, 2017, 30.
[39] Hadji I, Wildes R P. A new large scale dynamic texture dataset with application to convnet understanding[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 320-335.
[40] Wang Z, Simoncelli E P, Bovik A C. Multiscale structural similarity for image quality assessment[C]//The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. Ieee, 2003, 2: 1398-1402.
[41] Heusel M, Ramsauer H, Unterthiner T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[J]. Advances in neural information processing systems, 2017, 30.
[42] Mao Q, Tseng H Y, Lee H Y, et al. Continuous and diverse image-to-image translation via signed attribute vectors[J]. International Journal of Computer Vision, 2022, 130(2): 517-549.
[43] Zhang R, Isola P, Efros A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 586-595.
[44] Li K, Xirui P, Song J, et al. The application of augmented reality (ar) in remote work and education[J]. arXiv preprint arXiv:2404.10579, 2024.
[45] Li K, Zhu A, Zhou W, et al. Utilizing deep learning to optimize software development processes[J]. arXiv preprint arXiv:2404.13630, 2024.
[46] Zhao P, Li K, Hong B, et al. Task allocation planning based on hierarchical task network for national economic mobilization[J]. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023, 2024, 5(1): 22-31.