Academic Journal of Computing & Information Science, 2022, 5(12); doi: 10.25236/AJCIS.2022.051204.
Xing Liu
School of Information Science and Engineering, Chongqing JiaoTong University, Chongqing, 400074, China
Using satellite images and UAV images to locate the same geographical target can provide new ideas for UAV positioning and navigation. However, the images from the two different remote sensing platforms, UAV and satellite, have a huge difference in appearance, which is a huge challenge for this task. Existing methods are usually limited to convolutional neural networks, leading to a lack of utilization of global information of images, and these methods do not focus on the spatial information of images. To address these issues, this research propose a method that extracts posture information using SFM(Structure-from-Motion) and then uses the posture to align the spatial features of the image, and introduces a visual transformer to focus the network on acquiring the common feature space shared by the viewpoint sources. In this study, a large number of experiments have been carried out on the large datum dataset University-1652. The experimental results show that the method proposed in this paper outperforms the baseline and has the same advantages as other advanced methods.
Image Retrieval, Cross-View, Geo-Localization, SFM, Transformer
Xing Liu. A cross-view UAV image localization method based on directional feature alignment and visual transformer. Academic Journal of Computing & Information Science (2022), Vol. 5, Issue 12: 22-28. https://doi.org/10.25236/AJCIS.2022.051204.
[1] Vo, N., Jacobs, N. and Hays, J. (2017). Revisiting im2gps in the deep learning era. In Proceedings of the IEEE international conference on computer vision (pp. 2621-2630).
[2] Toker, A., Zhou, Q., Maximov, M. and Leal-Taixé, L. (2021). Coming down to earth: Satellite-to-street view synthesis for geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6488-6497).
[3] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.
[4] Bay, H., Tuytelaars, T. and Gool, L. V. (2006). Surf: Speeded up robust features. In European conference on computer vision (pp. 404-417). Springer, Berlin, Heidelberg.
[5] Zheng, Z., Wei, Y. and Yang, Y. (2020). University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 28th ACM international conference on Multimedia (pp. 1395-1403).
[6] Wang, T., Zheng, Z., Yan, C., Zhang, J., Sun, Y., Zheng, B. and Yang, Y. (2021). Each part matters: Local patterns facilitate cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(2), 867-879.
[7] Hu, S. and Chang, X. (2020). Multi-view drone-based geo-localization via style and spatial alignment. arXiv preprint arXiv:2006.13681.
[8] Tian, Y., Chen, C. and Shah, M. (2017). Cross-view image matching for geo-localization in urban environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3608-3616).
[9] Hu, S., Feng, M., Nguyen, R. M. and Lee, G. H. (2018). Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7258-7267).
[10] Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T. and Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5297-5307).
[11] Regmi, K. and Shah, M. (2019). Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 470-479).
[12] Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J. and Catanzaro, B. (2018). High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8798-8807).
[13] Schonberger, J. L. and Frahm, J. M. (2016). Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4104-4113).
[14] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z. and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10012-10022).
[15] Ding, L., Zhou, J., Meng, L. and Long, Z. (2020). A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization. Remote Sensing, 13(1), 47.
[16] He, S. and Wang, Y. (2021). Cross-view geo-localization via Salient Feature Partition Network. In Journal of Physics: Conference Series (Vol. 1914, No. 1, p. 012009). IOP Publishing.