Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2024, 7(5); doi: 10.25236/AJCIS.2024.070512.

Deep learning-based text recognition in natural scenes

Author(s)

Lizhi Cui, Honglei Tian, Shumin Fei

Corresponding Author:
Honglei Tian
Affiliation(s)

Henan Polytechnic University, Jiaozuo, Henan, China

Abstract

The rapid expansion of Internet technology into remote areas has not only broadened network coverage but has also facilitated the proliferation of terminal devices. This enhancement in infrastructure boosts data resources, which are essential for advancing intelligence and automation linked to the Fourth Industrial Revolution. However, a significant challenge remains: many older devices are still offline and require updates. Technologies such as scene text recognition, crucial for applications in autonomous driving and traffic sign recognition, can address these updates. The shift from traditional statistical methods and SVMs to deep learning has significantly enhanced text recognition capabilities. To further improve these advancements, this article introduces the Pre-Activated Haar Transform-Enhanced Pan model (PAHaar Pan). This model incorporates a pre-activation design, depthwise separable convolution, and the Haar wavelet transform, enhancing feature extraction and generalization while reducing memory usage. Additionally, it is bolstered by a multi-level hybrid attention mechanism, providing superior text recognition performance.

Keywords

Deep Learning, Text recognition, Natural Scenes

Cite This Paper

Lizhi Cui, Honglei Tian, Shumin Fei. Deep learning-based text recognition in natural scenes. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 5: 94-101. https://doi.org/10.25236/AJCIS.2024.070512.

References

[1] Sun Y, Zhang C, Huang Z, et al. TextNet: Irregular text reading from images with an end-to-end trainable network[M/OL]. arXiv, 2018.

[2] Feng W, He W, Yin F, et al. TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting [J/OL]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, null: 9075-9084.

[3] Liu Y, Chen H, Shen C, et al. ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network[J/OL]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, null: 9806-9815.

[4] Liao M, Shi B, Bai X, et al. TextBoxes: A fast text detector with a single deep neural network[C/OL]//Proceedings of the AAAI Conference on Artificial Intelligence: Vol. 31. 2017.

[5] Lyu P, Liao M, Yao C, et al. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 43: 532-548.

[6] Qin S, Bissacco A, Raptis M, et al. Towards Unconstrained End-to-End Text Spotting[J/OL]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, null: 4703-4713.

[7] Peng D, Jin L, Ma W, et al. Recognition of handwritten Chinese text by segmentation: A segment-annotation-free approach[J/OL]. IEEE Transactions on Multimedia, 2023, 25: 2368-2381.