Academic Journal of Computing & Information Science, 2025, 8(10); doi: 10.25236/AJCIS.2025.081014.
Cui Yanyu1, Li Yingying1, Wang Jianlong1
1School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China
In order to solve the deficiency of vision transformer (ViT) in local feature extraction and effectively integrate global and local information, the manuscript proposes a model called Masked autoencoder with Patch merger based on Convolutional neural network and Re-attention (MPCR) for polarimetric SAR image classification. The CNN is used to divide the input image into patches, which effectively extracts local features and enhances the ability of the model to capture details. However, as the number of transformer network layers increases, the traditional attention mechanism leads to information degradation, manifested by the attention maps of each layer tends to be consistent. To address this, re-attention is introduced. By dynamically adjusting the weights, the model can still maintain the diversified capture of information at a deep level, so as to better handle the complex input data. A simple merging operation is introduced between two consecutive transformer encoder layers to further improve computational efficiency. The operation effectively reduces the redundant calculation and reduces the computational complexity. The results show that the proposed method not only significantly improves the classification accuracy, but also effectively reduces the computational complexity when processing PolSAR images.
Polarimetric Synthetic Aperture Radar; Masked Autoencoder; Vision Transformer; Convolutional Neural Network
Cui Yanyu, Li Yingying, Wang Jianlong. MPCR: Masked Autoencoder with Patch Merger Based on Convolutional Neural Network and Re-Attention for Polarimetric SAR Image Classification. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 10: 108-123. https://doi.org/10.25236/AJCIS.2025.081014.
[1] Brown W M. Synthetic aperture radar[J]. IEEE Transactions on Aerospace and Electronic Systems, 2010 (2): 217-229.
[2] Wang H, Xu F, Jin Y Q. A review of polsar image classification: From polarimetry to deep learning[C]//IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2019: 3189-3192.
[3] Moreira A, Prats-Iraola P, Younis M, et al. A tutorial on synthetic aperture radar[J]. IEEE Geoscience and remote sensing magazine, 2013, 1(1): 6-43.
[4] Gomez L, Alvarez L, Mazorra L, et al. Fully PolSAR image classification using machine learning techniques and reaction-diffusion systems[J]. Neurocomputing, 2017, 255: 52-60.
[5] West R D, Riley R M. Polarimetric Interferometric SAR Change Detection Discrimination[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(6): 3091-3104.
[6] Xie W, Ma G, Zhao F, et al. PolSAR image classification via a novel semi-supervised recurrent complex-valued convolution neural network[J]. Neurocomputing, 2020, 388: 255-268.
[7] Jiao L, Liu F. Wishart deep stacking network for fast POLSAR image classification[J]. IEEE Transactions on Image Processing, 2016, 25(7): 3273-3286.
[8] Cameron W L, Leung L K. Feature motivated polarization scattering matrix decomposition[C]// IEEE International Conference on Radar. IEEE, 1990: 549-557.
[9] Cameron W L, Rais H. Derivation of a signed Cameron decomposition asymmetry parameter and relationship of Cameron to Huynen decomposition parameters[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(5): 1677-1688.
[10] Parikh H, Patel S, Patel V. Classification of SAR and PolSAR images using deep learning: A review[J]. International Journal of Image and Data Fusion, 2020, 11(1): 1-32.
[11] Takizawa Y, Shang F, Hirose A. Adaptive land classification and new class generation by unsupervised double-stage learning in Poincare sphere space for polarimetric synthetic aperture radars[J]. Neurocomputing, 2017, 248: 3-10.
[12] Li Z, Liu F, Yang W, et al. A survey of convolutional neural networks: analysis, applications, and prospects[J]. IEEE transactions on neural networks and learning systems, 2021, 33(12): 6999-7019.
[13] Zhou Y, Wang H, Xu F, et al. Polarimetric SAR image classification using deep convolutional neural networks[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(12): 1935-1939.
[14] Zhang Z, Wang H, Xu F, et al. Complex-valued convolutional neural network and its application in polarimetric SAR image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(12): 7177-7188.
[15] Yang C, Hou B, Ren B, et al. CNN-based polarimetric decomposition feature selection for PolSAR image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(11): 8796-8812.
[16] Dong H, Zou B, Zhang L, et al. Automatic design of CNNs via differentiable neural architecture search for PolSAR image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(9): 6362-6375.
[17] Han K, Wang Y, Chen H, et al. A survey on vision transformer[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(1): 87-110.
[18] Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[19] Dong H, Zhang L, Zou B. Exploring vision transformers for polarimetric SAR image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-15.
[20] Wang W, Wang J, Lu B, et al. MCPT: mixed convolutional parallel transformer for polarimetric SAR image classification[J]. Remote Sensing, 2023, 15(11): 2936.
[21] He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 16000-16009.
[22] Fuller A, Millard K, Green J R. Satvit: Pretraining transformers for earth observation[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5.
[23] Jeevan P, Sethi A. Vision Xformers: Efficient attention for image classification[J]. arXiv preprint arXiv:2107.02239, 2021.
[24] Zhou D, Kang B, Jin X, et al. Deepvit: Towards deeper vision transformer[J]. arXiv preprint arXiv:2103.11886, 2021.
[25] Renggli C, Pinto A S, Houlsby N, et al. Learning to merge tokens in vision transformers[J]. arXiv preprint arXiv:2202.12015, 2022.
[26] Targ S, Almeida D, Lyman K. Resnet in resnet: Generalizing residual architectures[J]. arXiv preprint arXiv:1603.08029, 2016.
[27] Jeny AA, Sakib ANM, Junayed MS, et al. SkNet: A convolutional neural networks based classification approach for skin cancer classes[C]//2020 23rd International Conference on Computer and Information Technology (ICCIT). IEEE, 2020: 1-6.
[28] Koonce B. MobileNetV3. In: Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization. Ed. by Koonce B. Berkeley, CA: Apress, 2021:125–44.
[29] Hassani A, Walton S, Shah N, et al. Escaping the big data paradigm with compact transformers[J]. arXiv preprint arXiv:2104.05704, 2021.
[30] Wang J, Li Y, Quan D, et al. MAPM: PolSAR image classification with masked autoencoder based on position prediction and memory tokens[J]. Remote Sensing, 2024, 16(22): 4280.