Small Object Detection in Intelligent Transportation Systems: Design and Optimization of the TSTD Model

<p>Tao Zhang, Yihe Jin, Jialin Wang</p>

doi:10.25236/AJCIS.2024.070715

Academic Journal of Computing & Information Science, 2024, 7(7); doi: 10.25236/AJCIS.2024.070715.

Small Object Detection in Intelligent Transportation Systems: Design and Optimization of the TSTD Model

Author(s)

Tao Zhang, Yihe Jin, Jialin Wang

Corresponding Author:

Tao Zhang

Affiliation(s)

Tianjin University of Technology and Education, Tianjin, China

Download PDF
|
Download: 72
|
View: 1508

Abstract

With the rapid development of transportation systems, road safety and traffic management have become crucial. Efficient traffic sign detection and recognition enhance traffic flow and safety. This paper proposes a Traffic Sign Tiny Detector (TSTD) algorithm to improve the performance of existing small object detection models. The TSTD algorithm utilizes efficientFormerv2, specifically designed for small objects, and optimizes the loss function with a normalized Wasserstein distance loss. It also employs the C2f_DBB module to replace traditional downsampling, preventing excessive loss of small object information. EfficientFormerv2 offers higher efficiency and lower computational cost, significantly reducing the model's complexity and training time while maintaining high accuracy. The C2f_DBB module, with its improved feature fusion and dual-branch structure, enhances the model's ability to detect small objects, ensuring high-precision recognition of tiny traffic signs. Extensive comparative experiments verify the model's advantages in traffic sign detection. Results show that TSTD significantly improves key performance metrics, such as mean Average Precision (mAP), over baseline models. In summary, the proposed TSTD can more accurately detect traffic signs, contributing to advancements in intelligent traffic management and improving road safety and traffic efficiency.

Keywords

Traffic Sign; efficientFormerv2; C2f_DBB; Normalized Wasserstein Distance

Cite This Paper

Tao Zhang, Yihe Jin, Jialin Wang. Small Object Detection in Intelligent Transportation Systems: Design and Optimization of the TSTD Model. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 7: 113-122. https://doi.org/10.25236/AJCIS.2024.070715.

References

[1] Liu, Wei, Anguelov, Dragomir, Erhan, Dumitru, Szegedy, Christian, Reed, Scott, Fu, Cheng-Yang, and Berg, Alexander C. SSD: Single Shot MultiBox Detector. arXiv, (2015).

[2] Redmon, Joseph, Divvala, Santosh, Girshick, Ross, and Farhadi, Ali. You Only Look Once: Unified, Real-Time Object Detection. arXiv, (2015).

[3] Ren, Shaoqing, He, Kaiming, Girshick, Ross, and Sun, Jian. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2017).

[4] Li, Yanyu, Hu, Ju, Wen, Yang, Evangelidis, Georgios, Salahi, Kamyar, Wang, Yanzhi, Tulyakov, Sergey, and Ren, Jian. Rethinking Vision Transformers for MobileNet Size and Speed. arXiv, (2023).

[5] Ding, Xiaohan, Zhang, Xiangyu, Han, Jungong, and Ding, Guiguang. Diverse Branch Block: Building a Convolution as an Inception-like Unit. arXiv, (2021).

[6] Wang, Jinwang, Xu, Chang, Yang, Wen, and Yu, Lei. A Normalized Gaussian Wasserstein Distance for Tiny Object Detection. arXiv, (2021).

[7] Liu, Ze, Lin, Yutong, Cao, Yue, Hu, Han, Wei, Yixuan, Zhang, Zheng, Lin, Stephen, and Guo, Baining. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv, (2021).

[8] Dosovitskiy, Alexey, Beyer, Lucas, Kolesnikov, Alexander, Weissenborn, Dirk, Zhai, Xiaohua, Unterthiner, Thomas, Dehghani, Mostafa, Minderer, Matthias, Heigold, Georg, Gelly, Sylvain, Uszkoreit, Jakob, and Houlsby, Neil. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv, (2020).

[9] Howard, Andrew G., Zhu, Menglong, Chen, Bo, Kalenichenko, Dmitry, Wang, Weijun, Weyand, Tobias, Andreetto, Marco, and Adam, Hartwig. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv, (2017).

[10] Li, Yanyu, Yuan, Geng, Wen, Yang, Hu, Ju, Evangelidis, Georgios, Tulyakov, Sergey, Wang, Yanzhi, and Ren, Jian. EfficientFormer: Vision Transformers at MobileNet Speed. arXiv, (2022).

[11] Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Jakob, Jones, Llion, Gomez, Aidan N., Kaiser, Lukasz, and Polosukhin, Illia. Attention Is All You Need. arXiv, (2017).

[12] Zheng, Zhaohui, Wang, Ping, Liu, Wei, Li, Jinze, Ye, Rongguang, and Ren, Dongwei. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv, (2019).

[13] Hu, Jie, Shen, Li, and Sun, Gang. Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2019).

[14] Zhang, Jianming, Zou, Xin, Kuang, Lide, and others. CCTSDB 2021: A More Comprehensive Traffic Sign Detection Benchmark. Human-centric Computing and Information Sciences, (2022).

[15] Zhang, Jianming, Wang, Wei, Lu, Chaoqiang, and others. Lightweight Deep Network for Traffic Sign Classification. Annals of Telecommunications, (2020).