Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2022, 5(12); doi: 10.25236/AJCIS.2022.051205.

Study of YOLOX Target Detection Method Based on Stand-Alone Self-Attention


Yanyang Zeng, Zihan Zhou, Yang Yu

Corresponding Author:
Zihan Zhou

College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo City, China


A target detection method based on an improved network of stand-alone self-attention mechanisms (YOLOX_SASA) is proposed to address the problems of complex picture backgrounds, slow detection speed, and low detection accuracy. The method firstly improves the speed of target detection by introducing the stand-alone self-attention module in the multi-scale feature fusion part of YOLOX, so that the network can increase the perceptual field while aggregating the neighborhood information. Secondly, by changing the YOLOX binary classification loss function BCE Loss to MultiLabelMargin Loss for label complementation, which in turn improves the target detection accuracy, and by introducing CutMix data enhancement in the training phase to expand the training set and increase the number of samples. Finally, to test the detection effectiveness of the algorithm, simulation experiments are conducted on a homemade small garbage classification dataset and the PASCAL VOC 2007 public dataset. The experimental results show that the method achieves an average accuracy of 93.81% based on satisfying the real-time performance, which is 4.53% better than the original YOLOX algorithm.


YOLOX; Stand-alone Self-attention; Target Detection; Multi-scale Feature Fusion; Deep Learning

Cite This Paper

Yanyang Zeng, Zihan Zhou, Yang Yu. Study of YOLOX Target Detection Method Based on Stand-Alone Self-Attention. Academic Journal of Computing & Information Science (2022), Vol. 5, Issue 12: 29-37. https://doi.org/10.25236/AJCIS.2022.051205.


[1] Zou Z, Shi Z, Guo Y, et al. Object detection in 20 years: A survey [J/OL]. (2019-05-16)[2021-11-15]. https://arxiv.org/abs/1905.05055.

[2] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-end object detection with transformers", 2020.

[3] W. Du, Y. Wang, and Y. Qiao, "Recurrent spatial-temporal attention network for action recognition in videos," IEEE Transactions on Image Processing, vol.27, no.3, PP.1347-1360, 2018.

[4] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, "Dual attention network for scene segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

[5] Xu D G, Wang L, Li F. A review of research on typical target detection algorithms for deep learning [J]. Computer Engineering and Applications, 2021, 57(08):10-25.

[6] Zhao Y-Q, Rao Y-Y, Dong S-P, Zhang J-Y. A review of deep learning target detection methods [J]. Chinese Journal of Graphics, 2020, 25(04):629-654.

[7] Harzallah H, Jurie F, Schmid C. Combining efficient object localization and image classification[C]//2009 IEEE 12th international conference on computer vision. IEEE, 2009: 237-244. 

[8] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.

[9] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.

[10] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.

[11] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.

[12] REDMON J, FARHADI A.YOLO9000: better, faster, stronger[C]/Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017:6517-6525.

[13] REDMON J, FARHADI A.YOLOv3: an incremental improvement[C]/Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:89-95.

[14] BOCHKOVSKIY A. WANG C Y, LIAO H-Y M. YOLOv4: Optimal Speed and Accuracy of object Detection [J]. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[15] Jocher G. Yolov5 [J]. Code repository https://github.com/ultralytics/yolov5, 2020. 019

[16] GE Z, LIU S, WANG F, et al. YOLOX: Exceeding YOLO series in 2021[J]. ArXiv Preprint, 2021. ArXiv: 2107.08430.

[17] Zhang Jianfei, Ke Sai. Research on improving YOLOX fire scene detection method [J]. Computer and Digital Engineering, 2022, 50(02):318-322+349.

[18] Y. S. Li, R. G. Ma, M. Y. Zhang. Improving YOLOv5s+DeepSORT for monitoring video traffic statistics [J]. Computer Engineering and Applications, 2022, 58(05):271-279

[19] Dengfeng Li, Ming Gao, Wentao Ye. A ship target detection algorithm combining lightweight feature extraction network [J]. Computer Engineering and Applications: 2022, 1-10.

[20] Cheng, X. X., Jiang, Z. Q., Cheng, L., Cheng, K. Improved YOLOX-S-based algorithm for helmet reflective clothing detection [J]. Electronic Measurement Technology, 2022, 45(06):130-135.

[21] REZATOFIGHI H, TSOI N, GWAK J Y, et al.  Generalied intersection over union: A metric and a loss for bounding box regression[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.658-666.

[22] P. Ramachandran, N. Parmar, A. Vaswani, I. Bello, A. Levskaya, and J. Shlens, “Stand-alone self-attention in vision models,” 2019.

[23] Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (voc) challenge [J]. Intemational journal of computer vision, 2010, 88(2): 303-338.