Design of art calligraphy image generation based on the diffusion model

<p>Peiqi Yuan<sup>1</sup>, Yekuan He<sup>2</sup></p>

doi:10.25236/AJETS.2024.070621

Academic Journal of Engineering and Technology Science, 2024, 7(6); doi: 10.25236/AJETS.2024.070621.

Design of art calligraphy image generation based on the diffusion model

Author(s)

Peiqi Yuan¹, Yekuan He²

Corresponding Author:

Peiqi Yuan

Affiliation(s)

¹Physics Department, Capital Normal University, Beijing, 100080, China

²Institute of International Education, Guangzhou College of Technology and Business, Foshan, 528100, China

Download PDF
|
Download: 34
|
View: 2000

Abstract

With the advancement of image generation technology, there has been a growing interest in automating the creation of stylized fonts using computers. Traditionally, designers have had to create multiple font styles to meet client demands, which required significant human and material resources. To address this challenge, we propose the Font Model Manager (FMM) model. This paper introduces the Type ControlNet and Type Condition Information Model, which enhance the precision of the font generation process and improve the accuracy of the generated images. Additionally, FMM incorporates a Type Image Compression Model, which reduces the computational time and storage costs required for training by compressing images, thereby increasing training efficiency. Furthermore, we have developed a comprehensive, accurately labeled, and high-resolution Typeface Image dataset, filling a gap in the market's available data. To evaluate the model's effectiveness, we employed Peak Signal-to-Noise Ratio (PSNR) as the primary metric, achieving an average value of 9.52 dB, which surpasses the performance of comparable models on the same dataset and ensures the visual quality of the generated font images. Overall, these advancements significantly improve the accuracy and efficiency of stylized font generation, meeting the market's demand for diverse font styles.

Keywords

FMM model, image processing, image compress model, diffusion model, controlnet

Cite This Paper

Peiqi Yuan, Yekuan He. Design of art calligraphy image generation based on the diffusion model. Academic Journal of Engineering and Technology Science (2024) Vol. 7, Issue 6: 145-154. https://doi.org/10.25236/AJETS.2024.070621.

References

[1] Ghadekar P, Gundawar A, Kamnapure S, et al. Improving Image Quality of Noisy Images Through Denoising and Style GAN Technique[C]//2023 7th International Conference On Computing, Communication, Control And Automation (ICCUBEA). IEEE, 2023: 1-6.

[2] Rajodiya P, Samruddha S, Alex S A, et al. Enhancing Image Fidelity through Denoising and Style GAN Techniques with Serial and Parallel Computation[C]//2024 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE). IEEE, 2024: 1-6.

[3] Chen C, Wu L, Xue Y. Style changing of the Real human face to cartoon face by Cycle-GAN[C]//2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE). IEEE, 2022: 410-414.

[4] Yuhana U L, Edo G, Syarif H. Enhancement of Blurred Indonesian License Plate Number Identification Using Multi-Scale Information CNN[C]//2023 3rd International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON). IEEE, 2023: 1-6.

[5] Ku W F, Siu W C, Cheng X, et al. Intelligent painter: Picture composition with resampling diffusion model[C]//2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023: 2255-2259.

[6] Trinh L T, Hamagami T. Latent Denoising Diffusion GAN: Faster sampling, Higher image quality[J]. IEEE Access, 2024.

[7] Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]//International conference on machine learning. PMLR, 2021: 8748-8763.

[8] Gafni O, Polyak A, Ashual O, et al. Make-a-scene: Scene-based text-to-image generation with human priors[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 89-106.

[9] Gal R, Alaluf Y, Atzmon Y, et al. An image is worth one word: Personalizing text-to-image generation using textual inversion[J]. arXiv preprint arXiv:2208.01618, 2022.

[10] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]// International conference on machine learning. PMLR, 2019: 6105-6114.

[11] Vieillard N, Kozuno T, Scherrer B, et al. Leverage the average: an analysis of kl regularization in reinforcement learning[J]. Advances in Neural Information Processing Systems, 2020, 33: 12163-12174.

[12] Xiong W, Dong H, Ye C, et al. Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint[C]//Forty-first International Conference on Machine Learning. 2024.

[13] Ying Z, Mandal M, Ghadiyaram D, et al. Patch-vq:'patching up'the video quality problem[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 14019-14029.

[14] Yan W, Zhang Y, Abbeel P, et al. Videogpt: Video generation using vq-vae and transformers[J]. arXiv preprint arXiv:2104.10157, 2021.

[15] Wei X, Zhang T, Li Y, et al. Multi-modality cross attention network for image and sentence matching[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10941-10950.

[16] Lu Z, Elhamifar E. Fact: Frame-action cross-attention temporal modeling for efficient action segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 18175-18185.

[17] Pavlova M A , Timofeev V , Bocharov D ,et al.Low-parameter method for delineation of agricultural fields in satellite images based on multi-temporal MSAVI2 data[J].Computer Optics, 2023.DOI: 10.18287/-6179-co-1235.

[18] Pang B, Nijkamp E, Wu Y N. Deep learning with tensorflow: A review[J]. Journal of Educational and Behavioral Statistics, 2020, 45(2): 227-248.