Welcome to Francis Academic Press

International Journal of Frontiers in Engineering Technology, 2021, 3(7); doi: 10.25236/IJFET.2021.030701.

Overview of Visual SLAM for Mobile Robots


Chen Wei, Aihua Li

Corresponding Author:
Chen Wei

Rocket Force University of Engineering, Xi’an Shaanxi 710038, China


Simultaneous Localization and Mapping (Simultaneous Localization and Mapping) technology refers to the technology of self-localization and construction of environmental maps based on visual sensors. It plays an important role in the field of autonomous mobile robots and autonomous vehicle navigation. This article introduces the classic framework and basic theory of visual SLAM, as well as the common methods and research progress of each part, enumerates the landmark achievements in the visual SLAM research process, and introduces the latest ORB-SLAM3. Finally, the current problems and future research directions of visual SLAM are proposed.


Mobile robots, Visual SLAM, Automatic navigation, ORB-SLAM3

Cite This Paper

Chen Wei, Aihua Li. Overview of Visual SLAM for Mobile Robots. International Journal of Frontiers in Engineering Technology (2021), Vol. 3, Issue 7: 1-7. https://doi.org/10.25236/IJFET.2021.030701.


[1] Gao X, Zhang T. Fourteen Lectures on Visual SLAM: From Theory to Practice [M]. BeiJing: Publishing House of Electronics Industry, 2019.

[2] Harris C, Stephens M. A combined corner and edge detector[J]. Alvey Vision Conference, 1988, 1988(3):147-151.

[3] Rosten E, Drummond T. Machine learning for high-speed corner detection[C]//Computer Vision – ECCV 2006. Berlin, Heidelberg:Springer, Berlin, Heidelberg, 2006: 430-443.

[4] Shi J B, Tomasi C. Good features to track[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 1994: 593-600.

[5] Hu K, Wu J, Zheng F,et al.A Survey of visual odometry[J]Journal of Nanjing University of Information Science&Technology(Natural Science Edition), 2021, 13( 3): 269-280.

[6] Lowe D G. Distinctive Image Features from Scale-Invariant Keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110

[7] Bay H, Tuytelaars T, van Gool L. SURF: speeded up robust features[C]//Computer Vision – ECCV 2006. Berlin, Heidelberg: Springer, Berlin, Heidelberg, 2006: 404-417.

[8] Rublee E, Rabaud V, Konolige K, et al. ORB: an efficient alternative to SIFT or SURF[C]// International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011:2564-2571.

[9] Lu Y, Song D Z. Robust RGB-D odometry using point and line features[C]// IEEE International Conference on Computer Vision (ICCV), 2015: 3934-3942

[10] Pumarola A, Vakhitov A, Agudo A, et al. PL-SLAM: Real-time monocular visual SLAM with points and lines[C]// IEEE International Conference on Robotics and Automation (ICRA), 2017: 4503-4508. 

[11] Concha A, Civera J. Using superpixels in monocular SLAM[C]. 2014 IEEE International Conference on Robotics and Automation. Hong Kong: IEEE, 2014: 365-372

[12] Forster C. Pizzoli M., Scaramuzza D. SVO: Fast semi-direct monocular visual odometry[C]// 2014 IEEE International Conference on Robotics and Automation. Hong Kong: IEEE, 2014: 15-22.

[13] Engel J, Schöps T, Cremers D. LSD-SLAM: Large-Scale Direct Monocular SLAM[C]//Computer 

Vision – ECCV 2014. Cham: Springer International Publishing, 2014: 834-849. 

[14] Engel J, Koltun V, Cremers D. Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3): 611-625

[15] Forster C, Zhang Z C, Michael G, et al. SVO: semidirect visual odometry for monocular and multicamera systems [J]. IEEE Trans on Robotics, 2017, 33(2): 249-265

[16] Sun H, Tong Z, Tang S, et al. SLAM Research Based on Kalman and Particle Filter [J]. Software Guide, 2018, 17(12): 1–4.

[17] Smith R C, Cheeseman P. On the Representation and Estimation of Spatial Uncertainty [J]. International Journal of Robotics Research, 1986, 5(4): 56-68 

[18] DEMIM F, NEMRA A, LOUADJ K. Robust SVSF-SLAM for Unmanned Vehicle in Unknown Environment [J]. IFAC-PapersOnLine, 2016, 49(21): 386-94.

[19] Murphy P. Bayesian Map Learning in Dynamic Environments[C]//Neural Information Processing Systems. Denver, USA: NIPS, 1999: 1015-1021

[20] Liang M, Min H, Luo R. Graph-based SLAM: A Survey [J]. Robot, 2013, 35(4): 500-512.

[21] KüMMERLE R, GRISETTI G, STRASDAT H, et al. G2o: A general framework for graph optimization[C] //2011 IEEE International Conference on Robotics and Automation. Shanghai, China: IEEE, 2011: 3607-3613.

[22] Lu F, Milios E. Globally consistent range scan alignment for environment mapping[J]. Autonomous robots, 1997, 4(4): 333-349.

[23] Duckett T, Marsland S, Shapiro J. Learning globally consistent maps by relaxation[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2000: 3841-3846.

[24] Olson E, Leonard J, Teller S. Fast iterative alignment of pose graphs with poor initial estimates[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2006: 2262-2269.

[25] Grisetti G, Stachniss C, Grzonka S, et al. A tree parameterization for efficiently computing maximum likelihood maps using gradient descent[M]//Robotics: Science and Systems III. Cambridge, USA: MIT Press, 2008: 65-72

[26] Grisetti G, Kummerle R, Stachniss C, et al. Hierarchical optimization on manifolds for online 2D and 3D mapping[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2010: 273-278.

[27] Sivic J, Zisserman A. Video Google: A text retrieval approach to object matching in videos[C]//IEEE International Conference on Computer Vision. Piscataway, USA: IEEE Computer Society, 2003: 1470-1477.

[28] Oliva A, Torralba A. Building the gist of a scene: The role of global image features in recognition[J]. Progress in Brain Research, 2006, 155(2): 23-36.

[29] Chen Z., Lam O., Jacobson A., et al, Convolutional neural network-based place recognition[C]. 2014 Australasian Conference on Robotics and Automation, Melbourne, Australia: ACRA, 2014: 1–8

[30] Arandjelovic R, Gronat P, Torii A, et al. NetVLAD: CNN architecture for weakly supervised place recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2016: 5297-5307.

[31] Lopez-Antequera M, Gomez-Ojeda R, Petkov N, et al. Appearance-invariant place recognition by discriminatively training a convolutional neural network[J]. Pattern Recognition Letters, 2017, 92(1): 89-95.

[32] Naseer T, Oliveira G L, Brox T, et al. Semantics-aware visual localization under challenging perceptual conditions[C]. IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2017: 2614-262.

[33] Pizzoli M, Forster C, Scaramuzza D. REMODE: Probabilistic, monocular dense reconstruction in real time[C]. 2014 IEEE International Conference on Robotics and Automation. Hong Kong: IEEE,2014: 2609–2616.

[34] Vogiatzis G., Hernández C. Video-based, Real-Time Multi View Stereo [J]. Image and Vision Computing, 2011, 29(7):434-441.

[35] Civera, J.; Davison, A.J.; Montiel, J. Inverse Depth Parametrization for Monocular SLAM [J]. IEEE Transactions on robotics, 2008, 24(5): 932–945.

[36] Mur-artal R, Montiel J M M, Tardós J D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System [J]. IEEE Transactions on Robotics, 2015, 31(5): 1147-63.

[37] Henry P,Krainin M,Herbst E,et al. RGB-D mapping:Using Kinect-style depth cameras for dense 3D modeling of indoor environments[J]. International Journal of Robotics Research, 2012, 31(5): 647-663.

[38] Kazhdan M, Bolitho M, Hoppe H. Poisson surface reconstruction [C]. The fourth Eurographics symposium on Geometry processing. Cagliari, Sardinia, Italy; Eurographics Association. 2006: 61–70.

[39] Stuckler J, Behnke S. Multi-resolution surfel maps for efficient dense 3D modeling and tracking [J]. Journal of Visual Communication and Image Representation, 2014, 25(1): 137-47.

[40] Hornung A, Wurm K M, Bennewitz M, et al. OctoMap: an efficient probabilistic 3D mapping framework based on octrees [J]. Autonomous Robots, 2013, 34(3): 189-206.

[41] Burri M, Oleynikova H, Achtelik M W, et al. Real-time visual-inertial mapping, re-localization and planning onboard MAVs in unknown environments[C]// 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015: 1872-1878.

[42] MUR-ARTAL R, TARDóS J D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras [J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-62.

[43] CAMPOS C, ELVIRA R, RODRíGUEZ J J G, et al. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM [J]. IEEE Transactions on Robotics, 2021, 1-17.

[44] Wang S, Clark R, Wen H, et al. DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks[C]. IEEE International Conference on Robotics and Automation. Singapore, 2017: 2043-2050.

[45] Lin Z, Xu Y. Loop detection algorithm based on Mask R-CNN[J]. Electronics and Software Engineering, 2021, 05): 71-3.

[46] Zhang T, Cai Y, Chen L. Construction of Visual Semantic Map Based on Deep Learning [J]. industrial control computer;process computer, 2020, 33(11): 94-96.

[47] Schmuck P, Chli M. Multi-UAV collaborative monocular SLAM[C]. IEEE International Conference on Robotics and Automation. Singapore, 2017:3863-3870.