Deep Learning-Based Text Detection in Natural Scenes

<p>Lizhi Cui, Honglei Tian, Shumin Fei</p>

doi:10.25236/AJCIS.2024.070508

Academic Journal of Computing & Information Science, 2024, 7(5); doi: 10.25236/AJCIS.2024.070508.

Deep Learning-Based Text Detection in Natural Scenes

Author(s)

Lizhi Cui, Honglei Tian, Shumin Fei

Corresponding Author:

Honglei Tian

Affiliation(s)

Henan Polytechnic University, Jiaozuo, Henan, China

Download PDF
|
Download: 6
|
View: 184

Abstract

Traditional text detection mainly relies on manual features, which are only applicable to simple environments and have limited generalisation capabilities. Although deep learning enhances the generalisation and robustness of detection, complex contexts still face challenges. Current CNN text detection algorithms are difficult to handle large-scale and long-distance text due to the limitation of receiving domain and spatial information extraction. This chapter proposes the GMSTNet model, which combines GhostNet V2, MobileNet V3, and Swin Transformer to enhance efficiency through segmented nonlinear activation, effectively handle small-size text and detailed features while enhancing global and local perception, and demonstrate good performance on multiple datasets.

Keywords

Deep Learning, Text Detection, Natural Scenes, Feature Fusion

Cite This Paper

Lizhi Cui, Honglei Tian, Shumin Fei. Deep Learning-Based Text Detection in Natural Scenes. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 5: 65-71. https://doi.org/10.25236/AJCIS.2024.070508.

References

[1] Tang Y, Han K, Guo J, et al. GhostNetV2: Enhance cheap operation with long-range attention[J]. arXiv, 2022.

[2] Howard A, Sandler M, Chu G, et al. Searching for MobileNetV3[M/OL]. arXiv, 2019.

[3] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[EB/OL]//arXiv.org. (2021-03-25).

[4] Li X, Yao X, Liu Y. Combining swin transformer and attention-weighted fusion for scene text detection[J/OL]. Neural Processing Letters, 2024, 56(2): 52.

[5] Nayef N, Yin F, Bizid I, et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT[C/OL]. 2017: 1454-1459.

[6] Karatzas D, Bigorda L G I, Nicolaou A, et al. ICDAR 2015 competition on Robust Reading[J/OL]. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015, null: 1156-1160.

[7] Ch’ng C K, Chan C S. Total-text: A comprehensive dataset for scene text detection and recognition[C/OL]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR): Vol. 01. 2017: 935-942.

[8] Wang W, Xie E, Li X, et al. Shape robust text detection with progressive scale expansion network[M/OL]. arXiv, 2019.

[9] Wang W, Xie E, Song X, et al. Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network[J/OL]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, null: 8439-8448.

[10] Wang Y, Xie H, Zha Z, et al. ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection[J/OL]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, null: 11750-11759.

[11] Liao M, Wan Z, Yao C, et al. Real-time scene text detection with differentiable binarization[M/OL]. arXiv, 2019.

[12] Chen Z, Wang J, Wang W, et al. FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation[M/OL]. arXiv, 2023.

[13] Cai Y, Liu Y, Shen C, et al. Arbitrarily shaped scene text detection with dynamic convolution[J/OL]. Pattern Recognition, 2022, 127: 108608.

[14] Liao M, Zou Z, Wan Z, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[M/OL]. arXiv, 2022.

[15] Fu Z, Xie H, Fang S, et al. Learning pixel affinity pyramid for arbitrary-shaped text detection[J/OL]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 19.

[16] Yang C, Chen M, Yuan Y, et al. Text Growing on Leaf[J/OL]. ArXiv, 2022, abs/2209.03016: null.

[17] Deng D, Liu H, Li X, et al. PixelLink: Detecting Scene Text via Instance Segmentation[J/OL]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1).

[18] Zhang S X, Zhu X, Hou J B, et al. Kernel Proposal Network for Arbitrary Shape Text Detection[J/OL]. IEEE transactions on neural networks and learning systems, 2022, PP: null.

[19] Wang F, Xu X, Chen Y, et al. Fuzzy semantics for arbitrary-shaped scene text detection[J/OL]. IEEE transactions on image processing, 2023, 32: 1-12.