Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2023, 6(2); doi: 10.25236/AJCIS.2023.060215.

MSFF: Multi-Scale feature fusion for fine-grained image classification

Author(s)

Yabo Shang, Hua Huo

Corresponding Author:
Hua Huo
Affiliation(s)

College of Information Engineering, Henan University of Science and Technology, Kaiyuan Avenue 263, Luoyang, 471023, China

Abstract

Fine-grained image classification is a sub-category classification problem with a common superior category. Aiming at the characteristics of large intra-class differences and small inter-class differences in fine-grained images, this paper proposes a fine-grained image classification method based on multi-scale feature fusion. The method constructs a three-branch network model. The attention module and local extraction module are used to obtain the image of the target object and the image of the parts with strong distinguishing detail features. The depth metric learning is used to shorten the distance from the same data by using misclassification information to improve the classification accuracy; secondly, without using the image bounding box/partial annotation information, the image information of different scales is fused through a parallel network structure; finally, the entire network is optimized by combining the loss functions of the three-branch networks. This method performs end-to-end training collaboratively in a multi-branch network to enhance the ability to express information, thereby improving the accuracy of image classification. To evaluate the effectiveness of our method, fine-grained classification experiments were conducted on three datasets. The experimental results show that the algorithm has higher classification accuracy than other fine-grained classification algorithms.

Keywords

Fine-grained visual classification, Multi-scale feature fusion, Multi-branch network, Deep metric learning

Cite This Paper

Yabo Shang, Hua Huo. MSFF: Multi-Scale feature fusion for fine-grained image classification. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 2: 109-119. https://doi.org/10.25236/AJCIS.2023.060215.

References

[1] Wei, X, Wang, P, Liu, L, Shen, C, Wu, J. Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples. IEEE Transactions on Image Processing 2019; 28(12): 6116–6125. 52doi:10.1109/TIP.2019.2924811.

[2] Wei, XS, Luo, JH, Wu, J, Zhou, ZH. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE transactions on image processing: a publication of the IEEE Signal Processing Society 56 2017; 26(6):2868. doi:10.1109/TIP.2017.2688133.

[3] Bo Zhao, Xiao Wu, Jiashi Feng, et al. Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia 2017;doi:10.1109/TMM.2017.2648498.

[4] Deng, A, Wu, Y, Zhang, P, Lu, Z, Li, W, Su, Z. A weakly supervised framework for real-world point cloud classification. Computers Graphics 2022; 102:78–88.

[5] Xiang, J, Zhang, N, Pan, R, Gao, W. Efficient fine-texture image retrieval using deep multi-view hashing. Computers & Graphics 2021; 101:93–105. 

[6] Bibissi, DL, Yang, J, Quan, S, Zhang, Y. Dual spin-image: A bidirectional spin-image variant using multi-scale radii for 3d local shape description. Computers & Graphics 2022; 103: 180–191. doi: 10. 1016/ 69j.cag.2022.02.010. 

[7] Fu, J, Zheng, H, Tao, M. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: IEEE Conference on Computer Vision Pattern Recognition. 2017, 4438–4446. 

[8] Ning, Z, Farrell, R, Darrell, T. Pose pooling kernels for sub-category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2012, p. 3665–3672.

[9] Xie, L, Qi, T, Hong, R, Yan, S, Bo, Z. Hierarchical part matching for fine-grained visual categorization. In: IEEE International Conference on Computer Vision. 2014, p. 1641–1648.

[10] Zhang, N, Donahue, J, Girshick, R, Darrell, T. Part-based r-cnns for fine-grained category detection. European Conference on Computer Vision 2014; 834–849doi: 10. 1007/ 978- 3- 319- 10590- 1_ 54.

[11] Girshick, R, Donahue, J, Darrell, T, Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Computer Society 2013; 580– 587doi: 10. 1109/ CVPR. 2014. 81. 

[12] Tao, H, Qi, H. See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification. arXiv 2019;doi:10.48550/arXiv.1901.09891. 

[13] Shu, X, Tang, J, Qi, GJ, Li, Z, Jiang, YG, Yan, S. Image classification with tailored fine-grained dictionaries. IEEE Transactions on Circuits and Systems for Video Technology 2018; 28(2):454–467. doi: 10.1109/TCSVT.2016.2607345.

[14] Wang, D, Shen, Z, Jie, S, Wei, Z, Zheng, Z. Multiple granularity descriptors for fine-grained categorization. In: 2015 IEEE International Conference on Computer Vision (ICCV). 2015, p. 2399– 2406.

[15] Xiao, T, Xu, Y, Yang, K, Zhang, J, Peng, Y, Zhang, Z. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. IEEE 2014; 842–850doi: 10.1109/ CVPR.2015.7298685. 

[16] Lin, TY, Roychowdhury, A, Maji, S. Bilinear cnn models for fine-grained visual recognition. Proceedings of the IEEE international conference on computer vision 2015; 1449–1457doi: 10. 48550/ arXiv. 1504. 07889. 

[17] Kong, S, Fowlkes, C. Low-rank bilinear pooling for fine-grained classification. IEEE Computer Society 2017; 7025–7034. 

[18] Eshratifar, AE, Eigen, D, Gormish, M, Pedram, M. Coarse2fine: a two-stage training method for fine-grained visual classification. Machine Vision and Applications 2021; 32(2):1–9. doi:10.1007/ s00138- 021-01180-y. 

[19] Fu, J, Zheng, H, Tao, M. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: IEEE Conference on Computer Vision Pattern Recognition. 2017, p. 4438–4446. 

[20] Lin, TY, Dollar, P, Girshick, R, He, K, Hariharan, B, Belongie, S. Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision Pattern Recognition (CVPR). 2017, p. 2117–2125.

[21] Wu, W, Zhang, Y, Wang, D, Lei, Y. Sk-net: Deep learning on point cloud via end-to-end discovery of spatial keypoints. Proceedings of the AAAI Conference on Artificial Intelligence 2020; 34(04): 6422– 6429. 

[22] Jaderberg, M, Simonyan, K, Zisserman, A, et al. Spatial transformer networks. Advances in neural information processing systems 2015; 28. 

[23] Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems 2016; 29. 

[24] Wah, C, Branson, S, Welinder, P, Perona, P, Belongie, S. The caltech- ucsd birds-200-2011 dataset. california institute of technology 2011;. 

[25] Maji, S, Rahtu, E, Kannala, J, Blaschko, M, Vedaldi, A. Fine-grained visual classification of aircraft. HAL - INRIA 2013;

[26] Krause, J, Stark, M, Deng, J, Li, FF. 3d object representations for fine-grained categorization. In: IEEE International Conference on Computer Vision Workshops. 2014, p. 554–561.

[27] Sun G, Cholakkal H, Khan S, et al. Fine-grained recognition: Accounting for subtle differences between similar classes[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 12047-12054.

[28] Gao Y, Han X, Wang X, et al. Channel interaction networks for fine-grained image categorization[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 10818- 10825.

[29] Wang Y, Morariu V I, Davis L S. Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018:4145-4154.

[30] Yang Z, Luo T, Dong W, et al. Learning to Navigate for Fine-grained Classification[J]. Springer, Cham, 2018:438-454.

[31] Luo W, Yang X, Mo X, et al. Cross-X Learning for Fine-Grained Visual Categorization [J]. 2019: 8241-8250.

[32] Yu C, Zhao X, Zheng Q, et al. Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition [J]. 2018:595-610.

[33] Zheng H, Fu J, Zha Z J, et al. Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-grained Image Recognition [J]. IEEE, 2019:5012-5021. 

[34] Chang D, Ding Y, Xie J, et al. The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification [J]. IEEE Transactions on Image Processing, 2020, PP(99):1-1.

[35] Zhang Y, Sun Y, Wang N, et al. MSEC: Multi-Scale Erasure and Confusion for Fine-grained Image Classification [J]. Neurocomputing, 2021, 449: 1-14.

[36] Ji R, Wen L, Zhang L, et al. Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020:10468-10477.