Comic Image Generation Based on Diffusion Models

<p>Yanling Zhang</p>

doi:10.25236/AJCIS.2025.080510

Academic Journal of Computing & Information Science, 2025, 8(5); doi: 10.25236/AJCIS.2025.080510.

Comic Image Generation Based on Diffusion Models

Author(s)

Yanling Zhang

Corresponding Author:

Yanling Zhang

Affiliation(s)

Management Science and Information Engineering, Hebei University of Economics and Business, Shijiazhuang, China, 050061

Download PDF
|
Download: 60
|
View: 2948

Abstract

With the rapid development of the digital entertainment industry, the demand for efficient comic image generation is increasing, especially in areas such as animation production, game design, and personalized content creation. High-quality and automated generation techniques have become essential. However, traditional comic image generation methods rely on manual drawing or basic image processing, making it difficult to achieve both rich details and automation, thus limiting creative flexibility and productivity. To address these challenges, this paper proposes a novel comic image generation model based on diffusion models—Hand-Drawn Comic Diffusion Model (HD-CDM). By learning complex image distributions, HD-CDM progressively refines noisy images to generate comics with intricate line work, vibrant colors, and distinctive artistic styles, significantly improving both quality and efficiency while reducing reliance on manual labor and computational resources. Furthermore, this paper constructs a diverse comic-style image dataset, providing a solid foundation for model training and evaluation, thereby advancing research in this field. Experimental results demonstrate that compared to existing comic image generation models, HD-CDM achieves superior performance in terms of image realism, stylistic consistency, and creative diversity, offering a novel solution for automated comic image generation. In the field of comic creation, it helps artists to quickly generate sketches or concept drawings; for non-professional comic creators, it lowers the threshold of comic creation. They can enter a simple text description or select a specific art style to quickly generate comic images with professional standards, thus making it easier for them to participate in the creation of comics.

Keywords

Diffusion, Controlnet, Image Generation, Comics

Cite This Paper

Yanling Zhang. Comic Image Generation Based on Diffusion Models. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 5: 86-93. https://doi.org/10.25236/AJCIS.2025.080510.

References

[1] Song T, Wen R, Zhang L. RoughSet-DDPM: An Image Super-Resolution Method Based on Rough set Denoising Diffusion Probability Model[J]. Tehnički vjesnik, 2024, 31(1): 162-170.

[2] Ke Aihua, Huang Y J, Yang J, et al. Text-guided image-to-sketch diffusion models[J]. Knowledge-Based Systems, 2024, 304: 112441.

[3] Li G, Zhao S, Zhao T. Diff-ReColor: Rethinking image colorization with a generative diffusion model[J]. Knowledge-Based Systems, 2024, 300: 112133.

[4] Zhao X, Chen W, Xie W, et al. Style attention based global-local aware GAN for personalized facial caricature generation[J]. Frontiers in Neuroscience, 2023, 17: 1136416.

[5] Chuanyu P A N, Guowei Y, Taijiang M U, et al. Generating animatable 3D cartoon faces from single portraits[J]. Virtual Reality & Intelligent Hardware, 2024, 6(4): 292-307.

[6] Dong Y, Li L, Zheng L. TSGAN: A two-stage interpretable learning method for image cartoonization[J]. Neurocomputing, 2024, 596: 127864.

[7] Tong S, Liu H, He Y, et al. Weakly Supervised Exaggeration Transfer for Caricature Generation With Cross-Modal Knowledge Distillation[J]. IEEE Computer Graphics and Applications, 2024.

[8] Nichol A Q, Dhariwal P. Improved denoising diffusion probabilistic models[C]//International conference on machine learning. PMLR, 2021: 8162-8171.

[9] Wang Z, Zheng H, He P, et al. Diffusion-gan: Training gans with diffusion[J]. arxiv preprint arxiv:2206.02262, 2022.

[10] Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 10684-10695.

[11] Zhang L, Rao A, Agrawala M. Adding conditional control to text-to-image diffusion models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 3836-3847.

[12] Zhao S, Chen D, Chen Y C, et al. Uni-controlnet: All-in-one control to text-to-image diffusion models[J]. Advances in Neural Information Processing Systems, 2024, 36.

[13] Li M, Yang T, Kuang H, et al. ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback: Project Page: liming-ai. github. io/ControlNet_Plus_Plus[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 129-147.

[14] Chung H, Lee E S, Ye J C. MR image denoising and super-resolution using regularized reverse diffusion[J]. IEEE transactions on medical imaging, 2022, 42(4): 922-934.

[15] Nandan D, Kanungo J, Mahajan A. An error-efficient Gaussian filter for image processing by using the expanded operand decomposition logarithm multiplication[J]. Journal of ambient intelligence and humanized computing, 2024, 15(1): 1045-1052.

[16] He Y, Bai W, Wang L, et al. SOH estimation for lithium-ion batteries: An improved GPR optimization method based on the developed feature extraction[J]. Journal of Energy Storage, 2024, 83: 110678.

[17] Chun P, Kikuta T. Self‐training with Bayesian neural networks and spatial priors for unsupervised domain adaptation in crack segmentation[J]. Computer‐Aided Civil and Infrastructure Engineering, 2024, 39(17): 2642-2661.

[18] Chuanyu P A N, Guowei Y, Taijiang M U, et al. Generating animatable 3D cartoon faces from single portraits[J]. Virtual Reality & Intelligent Hardware, 2024, 6(4): 292-307.

[19] Li H, Wu X J. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach[J]. Information Fusion, 2024, 103: 102147.

[20] Zhang L, Rao A, Agrawala M. Adding conditional control to text-to-image diffusion models[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 3836-3847.

[21] Martini M G. Measuring objective image and video quality: On the relationship between SSIM and PSNR for DCT-based compressed images[J]. IEEE Transactions on Instrumentation and Measurement, 2025.

[22] Al Najjar Y. Comparative analysis of image quality assessment metrics: MSE, PSNR, SSIM and FSIM[J]. International Journal of Science and Research (IJSR), 2024, 13(3): 110-114.

[23] Martini M. A simple relationship between SSIM and PSNR for DCT-based compressed images and video: SSIM as content-aware PSNR[C]//2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2023: 1-5.

[24] Qi T, Fang S, Wu Y, et al. Deadiff: An efficient stylization diffusion model with disentangled representations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 8693-8702.

[25] Zhang S, Chen Z, Zhao Z, et al. Hidiffusion: Unlocking higher-resolution creativity and efficiency in pretrained diffusion models[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 145-161.