Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2024, 7(4); doi: 10.25236/AJCIS.2024.070410.

A similar Chinese character detection algorithm that fuses Chinese character images and encoding

Author(s)

Chun Wang, Dejun Chen

Corresponding Author:
Chun Wang
Affiliation(s)

Wuhan University of Technology, Wuhan City, Hubei Province, China

Abstract

In the past, research on similar Chinese characters often focused on extracting the contours and structures of Chinese character images for comparison, or encoding the sound, form, and meaning of Chinese characters to measure their string similarity. However, similar characters found by such methods often have homophones but different structures. In this paper, we propose a fusion algorithm that combines Chinese character images with Chinese character encoding to fully extract the structural, stroke, stroke order, and contour features of Chinese characters. Based on these features, we can find a list of characters that are more similar to the original characters in terms of character structure and stroke order. The algorithm first selects specific pixel lattice to measure the similarity of Chinese character glyphs based on different Chinese character structures. Then, combined with the four corner stroke order encoding of Chinese characters, the structural stroke order similarity of Chinese characters is calculated through Jaro Winkler Distance. Finally, the fusion of the two results in the similarity of two Chinese character glyphs. The experimental results show that the similarity algorithm, which incorporates the advantages of image and encoding, can better obtain a list of similar characters with the same glyph structure and similar stroke order.

Keywords

Quadrangle encoding, Jaro Winkler Distance, glyph, similarity, Chinese character encoding

Cite This Paper

Chun Wang, Dejun Chen. A similar Chinese character detection algorithm that fuses Chinese character images and encoding. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 4: 72-79. https://doi.org/10.25236/AJCIS.2024.070410.

References

[1] Li Zeyao, Li Chengcheng. Handwritten Chinese Character Component Extraction Algorithm Based on Structural Knowledge [J]. Computer Engineering and Design, 2023, 44 (05): 1479-1486.

[2] Xu Qi. Handwritten Chinese Character Recognition Based on Convolutional Neural Networks [J]. Electronic Technology and Software Engineering. 2022 (09): 190-193.

[3] Zhao Zhijing, Jiang Di. Research on Language Classification Based on Editing Distance [J]. Language Research. 2020, 40 (02): 43-50.

[4] Zhang Shengnan. Research on String Similarity Algorithm Based on Editing Distance [J]. Computer Application Research [J]. 2023, 29 (14): 23-26.

[5] Diao Xingchun, Tan Mingchao, Cao Jianjun, et al. A String Similarity Calculation Method Integrating Multiple Editing Distances [J]. Computer Applications and Research. 2010, 27 (12): 4523-4525.

[6] Chen Ming, Du Qingzhi, Shao Yubin, Long Hua et al. A Chinese character similarity comparison algorithm based on phonetic codes [J]. Information Technology, 2018 (11): 73-75.

[7] Yunnian Din, Yangli Jia, and Zhenling Zhang. A conceptual similarity and correlation discrimination method based on HowNet[C]. MATEC Web of Conferences 309, 03020 (2020).

[8] Wang Huamin, Huang Mengxing, Feng Wenlong, Feng Siling. Chinese word similarity detection algorithm based on improved phonetic code HowNet [J]. Computer simulation. 2022, 39 (08): 460-465.

[9] Lazreg M B, Goodwin M, Granmo O C. Combining a context aware neural network with a denoising autoencoder for measuring string similarities [J]. Computer Speech & Language, 2022, 60.

[10] Zhao Jian, Feng Qiaosheng, He Juanjuan. New Features and Extraction Methods for Chinese Character Recognition [J]. Software, 2015, 36 (03): 31-36.

[11] Liu Mengdi, Liang Xun. A similarity calculation method for Chinese character shapes based on radical knowledge representation learning [J]. Chinese Journal of Information Science. 2021, 35 (12): 47-59.

[12] Zhang Huanqing, Zhang Man, Feng Ning, etc. Research on Standardization Technology of Pressure Plate Names Based on Jaro Winkler Distance Algorithm and Improved Processing Flow [J].Electrical Technology. 2019, (14): 69-73