Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2024, 7(5); doi: 10.25236/AJCIS.2024.070501.

Dual Subnet Label Assignment for End-to-End Fully Convolutional Pedestrian Detection in Crowd Scenes

Author(s)

Jing Wang1, Xiangqian Li1, Huazhu Xue2, Zhanqiang Huo1

Corresponding Author:
Zhanqiang Huo
Affiliation(s)

1School of Software, Henan Polytechnic University, Jiaozuo, China

2School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, China

Abstract

The fully convolutional detector uses a one-to-one (O2O) label assignment strategy removing NMS post-processing operations and realizes end-to-end detection. However, limited positive samples lead to slow convergence of the full convolutional end-to-end detector in crowded scenes. In this paper, we propose a Dual Subnet Label Assignment network (DSLA), which accelerates the model convergence to improve the detection performance and achieves end-to-end detection pipeline. First, we present Soft Label Assignment (SLA), which utilizes soft anchors with positive and negative sample semantics for model learning and accelerates model convergence. Meanwhile, SLA assigns labels to occluded targets in crowded scenes. We combine the SLA branch and the O2O branch for co-training, and achieve end-to-end detection through the O2O branch. Finally, we propose Feature Shuffle to solve the problem of lack of information interaction between different feature layers and highlight the features of the occluded part in a crowded scenes. Experiments demonstrate that DSLA outperforms OneNet in terms of model convergence speed and detection performance. Especially, on the CrowdHuman dataset, our method outperforms SOTA, achieving 91.7 AP, 47.2 MR-2, 80.3 JI, and 98.5 Recall.

Keywords

Object Detection, crowd scenes, end-to-end detector, feature fusion

Cite This Paper

Jing Wang, Xiangqian Li, Huazhu Xue, Zhanqiang Huo. Dual Subnet Label Assignment for End-to-End Fully Convolutional Pedestrian Detection in Crowd Scenes. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 5: 1-14. https://doi.org/10.25236/AJCIS.2024.070501.

References

[1] CARION N, MASSA F, SYNNAEVE G. End-to-end object detection with transformers; proceedings of the European conference on computer vision, pp. 213-229, 2020.

[2] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

[3] Z hang S, Wang X, Wang J, et al. What are expected queries in end-to-end object detection?[J]. arXiv preprint arXiv:2206.01232, 2022.

[4] Zhu X, Su W, Lu L, et al. Deformable detr: Deformable transformers for end-to-end object detection[J]. arXiv preprint arXiv:2010.04159, 2020.

[5] Chen Y, Zhang Z, Cao Y, et al. Reppoints v2: Verification meets regression for object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 5621-5631.

[6] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.

[7] Z hu B, Wang J, Jiang Z, et al. Autoassign: Differentiable label assignment for dense object detection[J]. arXiv preprint arXiv:2007.03496, 2020.

[8] Tian Z, Shen C, Chen H, et al. FCOS: Fully convolutional one-stage object detection [J]. arXiv preprint arXiv:1904.01355, 1904.

[9] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.

[10] FENG C, ZHONG Y, GAO Y. Tood: Task-aligned one-stage object detection; proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, pp: 3490-3499, 2021.

[11] CHU X, ZHENG A, ZHANG X. Detection in crowded scenes: One proposal, multiple predictions; [C]//Proceedings of the IEEE international conference on computer vision, pp: 12214-12223, 2020.

[12] WANG J, ZHAO C, HUO Z. High quality proposal feature generation for crowded pedestrian detection. Pattern Recognition vol.128, pp.108605 2022.

[13] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.

[14] Li X, Wang W, Wu L, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002-21012

[15] Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6569-6578.

[16] Bodla N, Singh B, Chellappa R, et al. Soft-NMS--improving object detection with one line of code[C]//Proceedings of the IEEE international conference on computer vision. 2017: 5561-5569.

[17] Chen Q, Chen X, Zeng G, et al. Group detr: Fast training convergence with decoupled one-to-many label assignment[J]. arXiv preprint arXiv:2207.13085, 2022, 2(3): 12.

[18] Sun P, Zhang R, Jiang Y, et al. Sparse r-cnn: End-to-end object detection with learnable proposals[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 14454-14463.

[19] Lin M, Li C, Bu X, et al. Detr for crowd pedestrian detection[J]. arXiv preprint arXiv:2012.06785, 2020.

[20] Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6848-6856.

[21] HU H, GU J, ZHANG Z, Relation networks for object detection; [C]//Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3588-3597, 2018.

[22] ZHANG S, CHI C, YAO Y. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection; [C]//Proceedings of the IEEE conference on computer vision and pattern recognition, pp.6579-9568, 2020.

[23] LIN T-Y, DOLLáR P, GIRSHICK R. Feature pyramid networks for object detection; [C]//Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2117-2125, 2017.

[24] JIANG H, ZHANG X, XIANG S J I T O M. Non-maximum Suppression Guided Label Assignment for Object Detection in Crowd Scenes[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2023.

[25] WANG J, SONG L, LI Z. End-to-end object detection with fully convolutional network; [C]//Proceedings of the IEEE conference on computer vision and pattern recognition, pp.15846-15858, 2021.

[26] HE K, ZHANG X, REN S. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.

[27] DENG J, DONG W, SOCHER R. Imagenet: A large-scale hierarchical image database; proceedings of the 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255, 2009.

[28] LOSHCHILOV I, HUTTER F J A P A. Decoupled weight decay regularization, IEEE conference on computer vision and pattern recognition, 2017.

[29] Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[J]. Advances in neural information processing systems, 2019, 32.

[30] Chen K, Wang J, Pang J, et al. MMDetection: Open mmlab detection toolbox and benchmark[J]. arXiv preprint arXiv:1906.07155, 2019.

[31] Liu S, Huang D, Wang Y. Adaptive nms: Refining pedestrian detection in a crowd[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 6459-6468

[32] LIN T-Y, MAIRE M, BELONGIE S. Microsoft coco: Common objects in context; proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part, vol.13, 2014. Springer.