Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2023, 6(2); doi: 10.25236/AJCIS.2023.060207.

Study on Deep Learning-Based Natural Scene Text Recognition

Author(s)

Jiashun Weng, Xiaoyun Jia, Yanluo Liu

Corresponding Author:
Jiashun Weng
Affiliation(s)

School of Electronic Information and Artificial Intelligence, Shaanxi University of Science & Technology, Xi’an, China

Abstract

Aiming at the problem that the natural scene text recognition algorithm pays too much attention to the local character classification content and ignores the global information content of the entire text, a natural scene text recognition algorithm based on multi-network convergence and multi-head attention mechanism is proposed. Firstly, the algorithm uses a multi-network convergence structure to design multiple residual modules to capture contextual features and semantic features in visual features. Then, in the process of character prediction, a multi-head attention mechanism encoder is proposed, which stitches position information, visual features, context features and semantic features into a new feature space. Finally, the new feature space is reweighted by the self-attention mechanism, which improves the accuracy of predicting text information while paying attention to the connection between feature sequences. The recognition accuracy of SVT and ICDAR2015 on the regular and irregular text datasets reached 91.4% and 82.4%, respectively, which improved by about 1.8% and 2.4% compared with the current popular algorithms. Experimental results show that the model can make better use of position features, global semantic features and context features to more accurately identify text content, and improve the accuracy of the model.

Keywords

scene text recognition; multi-network convergence; multi-head attention mechanism; feature extraction

Cite This Paper

Jiashun Weng, Xiaoyun Jia, Yanluo Liu. Study on Deep Learning-Based Natural Scene Text Recognition. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 2: 44-52. https://doi.org/10.25236/AJCIS.2023.060207.

References

[1] Liu C, Cao Y, Luo Y, et al. Deepfood: Deep learning-based food image recognition for computer-aided dietary as-sessment[C]//International Conference on Smart Homes and Health Telematics. Springer, Cham, 2016: 37-48.

[2] Kahn G, Villaflor A, Ding B, et al. Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation[C]//2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018: 5129-5136.

[3] Shen Z, Zhang R, Dell M, et al. LayoutParser: A unified toolkit for deep learning based document image analy-sis[C]//International Conference on Document Analysis and Recognition. Springer, Cham, 2021: 131-146.

[4] Liu C, Cao Y, Luo Y, et al. Deepfood: Deep learning-based food image recognition for computer-aided dietary as-sessment[C]//International Conference on Smart Homes and Health Telematics. Springer, Cham, 2016: 37-48.

[5] Shi C, Wang C, Xiao B, et al. Scene text recognition using part-based tree-structured character detec-tion[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2013: 2961-2968.

[6] Romero V, Sanchez J A, Bosch V, et al. Influence of text line segmentation in handwritten text recogni-tion[C]//2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2015: 536-540.

[7] Coates A, Carpenter B, Case C, et al. Text detection and character recognition in scene images with unsupervised feature learning[C]//2011 International conference on document analysis and recognition. IEEE, 2011: 440-445.

[8] LIU C Y, CHEN X X, LUO C J. Deep Learning Method for Text Detection and Recognition in Natural Scenes[J]. Journal of Image and Graphics,2021,26(06):1330-1367.

[9] Shi B, Bai X, Yao C. An end-to-end trainable neural net-work for image-based sequence recognition and its ap-plication to scene text recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(11): 2298-2304.

[10] Lee C Y, Osindero S. Recursive recurrent nets with atten-tion modeling for ocr in the wild[C]//Proceedings of the IEEE conference on computer vision and pattern recog-nition. 2016: 2231-2239.

[11] Sheng F, Chen Z, Xu B. NRTR: A no-recurrence se-quence-to-sequence model for scene text recogni-tion[C]//2019 International conference on document analysis and recognition (ICDAR). IEEE, 2019: 781-786.

[12] Arnab A, Dehghani M, Heigold G, et al. Vivit: A video vision transformer[C]//Proceedings of the IEEE/CVF In-ternational Conference on Computer Vision. 2021: 6836-6846.

[13] Qiao Z, Zhou Y, Yang D, et al. Seed: Semantics enhanced encoder-decoder framework for scene text recogni-tion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 13528-13537.

[14] Graves A, Fernández S, Gomez F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd international conference on Machine learning. 2006: 369-376.

[15] Zhanbin L, Feng L, Xiting W. Design of Multi-Network Convergence for Complex Networks[C] //2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS). IEEE, 2018: 211-215.

[16] Fukui H, Hirakawa T, Yamashita T, et al. Attention branch network: Learning of attention mechanism for visual ex-planation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 10705-10714.

[17] Subakan C, Ravanelli M, Cornell S, et al. Attention is all you need in speech separation[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021: 21-25.

[18] SUN J, ZHU Y Q, HUANG C N. Scene text recognition model based on two-dimensional CTC and attention se-quence[J].Electronic Production, 2022,30(17):65-70.

[19] Shi B, Wang X, Lyu P, et al. Robust scene text recogniton with automatic rectification [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4168-4176.

[20] Cheng Z, Xu Y, Bai F, et al. Aon: Towards arbitrari-ly-oriented text recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 5571-5579.

[21] Luo C, Jin L, Sun Z. Moran: A multi-object rectified at-tention network for scene text recognition [J]. Pattern Recognition, 2019, 90: 109-118.

[22] Yang M, Guan Y, Liao M, et al. Symmetry-constrained rectification network for scene text recogni-tion[C]//Proceedings of the IEEE/CVF international con-ference on computer vision. 2019: 9147-9156.