Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2025, 8(2); doi: 10.25236/AJCIS.2025.080205.

Research on Small Target Detection Algorithm Based on Improved YOLOv5

Author(s)

Yi Shi, Lei Ding, Shan Li, Xin Wang

Corresponding Author:
Yi Shi
Affiliation(s)

School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an, Shaanxi, China

Abstract

In the field of computer vision, small target detection is extremely challenging due to the small size, low pixel density, and lack of contextual information of targets, making it difficult for traditional algorithms to effectively identify small targets because of frequent missed and false detections. Therefore, designing efficient and accurate detection algorithms is crucial. This paper proposes a novel small target detection method named YOLO_GF, which is based on the YOLOv5s framework. It introduces an improved DAMO-YOLO dense association memory mechanism to address the insufficient exchange of information between high- and low-level features during feature fusion. Furthermore, by integrating the improved efficient multi-scale attention EMA, the method utilizes the wide receptive field of parallel subnetworks to collect multi-scale spatial information and establish interdependencies among different spatial locations, thereby realizing a cross-spatial learning mechanism to enhance model accuracy. Additionally, the method employs a novel convolutional CSPStage to construct the feature fusion module, reducing model inference latency while improving detection accuracy. YOLO_GF achieves an end-to-end small target detection process and has been validated on the VisDrone2019 dataset, with experimental results demonstrating its high detection accuracy and good detection performance.

Keywords

Small Target Detection; DAMO-YOLO; Feature Fusion; Cross-Spatial Learning

Cite This Paper

Yi Shi, Lei Ding, Shan Li, Xin Wang. Research on Small Target Detection Algorithm Based on Improved YOLOv5. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 2: 37-44. https://doi.org/10.25236/AJCIS.2025.080205.

References

[1] Ge Z, Liu S, Wang F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv:2107.08430, 2021.

[2] Rahmati M, Pompili D. UNISeC: Inspection, separation, and classification of underwater acoustic noise point sources[J]. IEEE Journal of Oceanic Engineering, 2017, 43(3): 777-791.

[3] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[J]. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 779-788.

[4] Liu K, Sun Q, Sun D, et al. Underwater target detection based on improved YOLOv7[J]. Journal of Marine Science and Engineering, 2023, 11(3): 677.

[5] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[J]. Springer, Cham: European conference on computer vision, 2016: 21-37.

[6] Tan M X, Pang R M, Le Q V. EfficientDet: scalable and effi-cient object detection[C] //Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition.Los Alamitos: IEEE Computer Society Press, 2020:10778-10787.

[7] Tan M, Le Q. Efficient Net: rethinking model scaling for con-volutional neural networks[C] //Proceedings of International Conference on Machine Learning. New York: PMLR, 2019:6105-6114.

[8] Chen Q, Wang Y M, Yang T M, et al. You only look one-level feature[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2021: 13034-13043.

[9] Wang, Chengcheng, et al. "Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism." arXiv preprint arXiv:2309.11331 (2023).

[10] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2018: 7132-7141.

[11] Xu X, Jiang Y, Chen W, et al. DAMO-YOLO : A Report on Real-Time Object Detection Design[J].ArXiv, 2022, abs/2211.15444.DOI:10.48550/arXiv.2211.15444.

[12] Wang F Y, Hu H T, Shen C. BAM: a lightweight and efficient balanced attention mechanism for single image super resolution [OL]. [2021-09-10]. https://arxiv.org/abs/2104.07566.

[13] Wang, Wenhai, et al. "Internimage: Exploring large-scale vision foundation models with deformable convolutions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[14] Zhu L, Wang X,  Ke Z, et al. BiFormer: Vision Transformer with Bi-Level Routing Attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 10323-10333.

[15] Deng W, Yuan H, Deng L, et al. Reparameterized Residual Feature Network for Lightweight Image Super-Resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 1712-1721.