Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2021, 4(5); doi: 10.25236/AJCIS.2021.040511.

Pedestrian detection based on multi-layer feature fusion


Ruolan Deng1, Biqiang Guan2, Ziteng Li3, Jiahao Wang4

Corresponding Author:
Ruolan Deng

1University of Leeds, Leeds, United Kingdom

2Beijing University of Chemical Technology, Beijing, China

3Beijing Information Science and Technology University, Beijing, China

4Zhejiang University, Hangzhou, Zhejiang, China

These authors contributed equally to this work


This paper proposes a goal detection network of end-to-end multi-scale feature fusion because of the tiny pedestrian target and blocking in pedestrian detection. This algorithm is based on the YOLOv3 network, fully integrates multi-scale features, enhances the expression ability of small target features, improves the robustness of pedestrian detection in complex environments, and improves pedestrian detection accuracy based on guaranteeing real-time detection. In the experiment, the current mainstream pedestrian detection algorithm is compared. This algorithm effectively improves the detection accuracy in INRIA and KITTI data sets, and the average accuracy of Yolov3 in two different data sets is improved by 6% and 24.7%, respectively.


Driverless, Pedestrian Detection, Target Detection, YOLOv3

Cite This Paper

Ruolan Deng, Biqiang Guan, Ziteng Li, Jiahao Wang. Pedestrian detection based on multi-layer feature fusion. Academic Journal of Computing & Information Science (2021), Vol. 4, Issue 5: 76-84. https://doi.org/10.25236/AJCIS.2021.040511.


[1] Jun Hua, Zhicheng Sun, Junwei Zhao, and Tong Zhu, "Pedestrian active safety system and its impact on traffic flow," Journal of Chongqing Jiaotong University (Natural Science Edition), vol. 40, no. 03, pp. 34 -- 42. 

[2] Li Ling, "Study on the Impact of Traffic Conflicts on the Traffic Safety of Pedestrians", Smart City, Vol.V.6; No.96, No. 23, pp. 133 -- 134, 2020. 

[3] Yuliang Hong, Pingyi Ye, Lin Zhao, Xingya Zhang, Guanghua Zhao, "A Study on Pedestrian Operation Safety of Large Events", Transportation and Transportation, Vol. V. 33; No.48, No.S2, pp. 100 -- 104, 2020. 

[4] Yan Xilei and Wang Yunxia, "Traffic Safety Problems and Improvement Suggestions of Pedestrian Crossroads in Sections", China Public Safety (Academic Edition), No. 2, 2019. 

[5] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), 2005, vol. 1, pp. 886–893.

[6] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," in Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, 2001, vol. 1, pp. I–I.

[7] P. Viola and M. J. Jones, "Robust real-time face detection," International journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004.

[8] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, "Object detection with discriminatively trained part-based models," IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2009.

[9] A. Prioletti, A. Møgelmose, P. Grisleri, M. M. Trivedi, A. Broggi, and T. B. Moeslund, "Part-based pedestrian detection and feature-based tracking for driver assistance: real-time, robust algorithms, and evaluation," IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1346–1359, 2013.

[10] K. C. Kirana, S. Wibawanto, and H. W. Herwanto, "Redundancy Reduction in Face Detection of Viola-Jones using the Hill Climbing Algorithm," in 2020 4th International Conference on Vocational Education and Training (ICOVET), 2020, pp. 139–143.

[11] J. X. Zeng and X. Chen, "Pedestrian Detection Combined with Single and Couple Pedestrian DPM Models in Traffic Scene," Acta Electronica Sinica, 2016.

[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.

[13] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.

[14] R. Girshick, "Fast R-CNN," Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1440–1448, 2015, doi: 10.1109/ICCV.2015.169.

[15] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031.

[16] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask R-CNN," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020, doi: 10.1109/TPAMI.2018.2844175.

[17] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.

[18] J. Redmon and A. Farhadi, "YOLO9000: Better, faster, stronger," Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 6517–6525, 2017, doi: 10.1109/CVPR.2017.690.

[19] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," Tech report, pp. 1–6, 2018, [Online]. Available: https://pjreddie.com/media/files/papers/YOLOv3.pdf.

[20] W. Liu et al., "SSD: Single shot multibox detector," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905 LNCS, pp. 21–37, 2016, doi: 10.1007/978-3-319-46448-0_2.

[21] J. Han, Y. Liao, J. Zhang, S. Wang, and S. Li, "Target fusion detection of LiDAR and camera based on the improved YOLO algorithm," Mathematics, vol. 6, no. 10, p. 213, 2018.

[22] P. Kuang, T. Ma, F. Li, and Z. Chen, "Real-time pedestrian detection using convolutional neural networks," International Journal of Pattern Recognition and Artificial Intelligence, vol. 32, no. 11, p. 1856014, 2018.

[23] Li Fujin, Meng Luda, "Pedestrian Detection Algorithm Based on Feature Pyramid SSD," Journal of North China University of Science and Technology (Natural Science Edition), vol.v.43; No.15, No. 01, pp. 120 -- 126, 2021 

[24] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[25] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.

[26] U. G. Mangai, S. Samanta, S. Das, and P. R. Chowdhury, "A survey of decision fusion and feature fusion strategies for pattern classification," IETE Technical review, vol. 27, no. 4, pp. 293–307, 2010.