Abstract:With the rapid development of the autonomous driving and the virtual reality technologies, visual simultaneous localization and mapping (SLAM) has become a research hotspot in recent years. Three main problems of loop-closure detection of visual SLAM in complex environments are surveyed, i.e. place description, decision model, and evaluation of loop-closure detection. Firstly, the place description methods are introduced based on classical image features, deep learning, depth information and time-varying map, and the advantages and disadvantages of different methods are analyzed in detail. Secondly, some decision models are summarized which are commonly used in the process of loop recognition based on place description, especially for the probability model and the sequence matching. Thirdly, the performance evaluation method of loop-closure detection is explained, and its connection with the backend optimization is analyzed. Finally, the future directions that contribute to the development of loop-closure detection are discussed, focusing on several key points, such as deep learning, backend optimization and fusion of multiple descriptors.
[1] Cheng J, Jiang Z, Zhang Y, et al. Toward robust linear SLAM[C]//IEEE International Conference on Mechatronics and Automation. Piscataway, USA:IEEE, 2014:705-710.
[2] Lowe D G. Object recognition from local scale-invariant features[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2002:1150.
[3] Se S, Lowe D G, Little J J. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks[J]. International Journal of Robotics Research, 2002, 21(8):735-760.
[4] Stumm E, Mei C, Lacroix S. Probabilistic place recognition with covisibility maps[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2013:4158-4163.
[5] Košecká J, Li F, Yang X. Global localization and relative positioning based on scale-invariant key points[J]. Robotics & Autonomous Systems, 2005, 52(1):27-38.
[6] Bay H, Tuytelaars T, Gool L V. SURF:Speeded up robust fea-tures[J]. Computer Vision & Image Understanding, 2006, 110(3):404-417.
[7] Rublee E, Rabaud V, Konolige K, et al. ORB:An efficient alternative to SIFT or SURF[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2012:2564-2571.
[8] Sivic J, Zisserman A. Video Google:A text retrieval approach to object matching in videos[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE Computer Society, 2003:1470-1477.
[9] Galvez-López D, Tardos J D. Bags of binary words for fast place recognition in image sequences[J]. IEEE Transactions on Robotics, 2012, 28(5):1188-1197.
[10] Mur-Artal R, Tardos J D. ORB-SLAM2:An open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2016, 33(5):1255-1262.
[11] Angeli A, Doncieux S, Meyer J A, et al. Incremental vision-based topological SLAM[C]//IEEE International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2011:1031-1036.
[12] Cummins M J, Newman P M. Appearance-only SLAM at large scale with FAB-MAP 2.0[J]. International Journal of Robotics Research, 2011, 30(9):1100-1123.
[13] Paul R, Newman P M. FAB-MAP 3D:Topological mapping with spatial and visual appearance[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2010:2649-2656.
[14] Nicosevici T, Garcia R. Automatic visual bag-of-words for online robot navigation and mapping[J]. IEEE Transactions on Robotics, 2012, 28(4):886-898.
[15] Tardos J D. Robust place recognition with stereo sequences[J]. IEEE Transactions on Robotics, 2012, 28(4):871-885.
[16] Valgren C, Lilienthal A J. SIFT, SURF and seasons:Ap-pearance-based long-term localization in outdoor environments[J]. Robotics & Autonomous Systems, 2010, 58(2):149-156.
[17] Loquercio A, Dymczyk M, Zeisl B, et al. Efficient descriptor learning for large scale localization[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017. DOI:10.1109/ICRA.2017.7989359.
[18] Gao X, Zhang T. Unsupervised learning to detect loops using deep neural networks for visual SLAM system[J]. Autonomous Robots, 2017, 41(1):1-18.
[19] Oliva A, Torralba A. Building the gist of a scene:The role of global image features in recognition[J]. Progress in Brain Research, 2006, 155(2):23-36.
[20] Kröse B J A, Vlassis N, Bunschoten R, et al. A probabilistic model for appearance-based robot localization[J]. Image & Vision Computing, 2001, 19(6):381-391.
[21] Lowry S M, Wyeth G F, Milford M J. Unsupervised online learning of condition-invariant images for place recognition[J]. Procedia-Social and Behavioral Sciences, 2014, 106:1418-1427.
[22] Ulrich I, Nourbakhsh I. Appearance-based place recognition for topological localization[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2000:1023-1029.
[23] Sunderhauf N, Protzel P. BRIEF-Gist-Closing the loop by simple means[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2011:1234-1241.
[24] Chen Z, Lam O, Jacobson A, et al. Convolutional neural network-based place recognition[EB/OL]. (2014-11-06)[2018-01-02]. http://pdfs.semanticscholar.org/a93d/6e82cd500663a8eea02a8e3617632aafe913.pdf.
[25] Sunderhauf N, Shirazi S, Dayoub F, et al. On the performance of ConvNet features for place recognition[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2015:4297-4304.
[26] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2012, 60(6):84-90.
[27] Arandjelovic R, Gronat P, Torii A, et al. NetVLAD:CNN architecture for weakly supervised place recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2016:5297-5307.
[28] Jegou H, Douze M, Schmid C, et al. Aggregating local descriptors into a compact image representation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2010:3304-3311.
[29] Lopez-Antequera M, Gomez-Ojeda R, Petkov N, et al. Appearance-invariant place recognition by discriminatively training a convolutional neural network[J]. Pattern Recognition Letters, 2017, 92(1):89-95.
[30] Naseer T, Oliveira G L, Brox T, et al. Semantics-aware visual localization under challenging perceptual conditions[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:2614-2620.
[31] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2015:3431-3440.
[32] Torii A, Arandjelovic R, Sivic J, et al. 24/7 place recognition by view synthesis[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2015:1808-1817.
[33] Milford M J, Wyeth G F. SeqSLAM:Visual route-based navigation for sunny summer days and stormy winter nights[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2012:1643-1649.
[34] Felzenszwalb P F, Huttenlocher D P. Efficient graph-based image segmentation[J]. International Journal of Computer Vision, 2004, 59(2):167-181.
[35] Manen S, Guillaumin M, Gool L V. Prime object proposals with randomized Prim's algorithm[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2014:2536-2543.
[36] Neubert P, Protzel P. Local region detector + CNN based landmarks for practical place recognition in changing environments[C]//European Conference on Mobile Robots. Piscataway, USA:IEEE, 2015. DOI:10.1109/ECMR.2015.7324051.
[37] Neubert P, Protzel P. Beyond holistic descriptors, keypoints, and fixed patches:Multiscale superpixel grids for place recognition in changing environments[J]. IEEE Robotics & Automation Letters, 2016, 1(1):484-491.
[38] Ren S, Girshick R, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.
[39] Zitnick C L, Dollár P. Edge boxes:Locating object proposals from edges[M]//Lecture Notes in Computer Science, Vol.8693. Berlin, Germany:Springer-Verlag, 2014:391-405.
[40] Sunderhauf N, Shirazi S, Jacobson A, et al. Place recognition with ConvNet landmarks:Viewpoint-robust, condition-robust, training-free[C]//Robotics:Science and Systems. Cambridge, USA:MIT Press, 2015. DOI:10.15607/RSS.2015.XI.022.
[41] Cascianelli S, Costante G, Bellocchio E, et al. Robust visual semi-semantic loop closure detection by a covisibility graph and CNN features[J]. Robotics and Autonomous Systems, 2017, 92:53-65.
[42] Lowry S M, Sunderhauf N, Newman P M, et al. Visual place recognition:A survey[J]. IEEE Transactions on Robotics, 2016, 32(1):1-19.
[43] Mei C, Sibley G, Newman P M. Closing loops without places[C]//IEEE/RSJ International Conference on IntelligentRobots and Systems. Piscataway, USA:IEEE, 2010:3738-3744.
[44] Engel J, Koltun V, Cremers D. Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3):611-625.
[45] Endres F, Hess J, Sturm J, et al. 3-D mapping with an RGB-D camera[J]. IEEE Transactions on Robotics, 2017, 30(1):177-187.
[46] Engel J, Schöps T, Cremers D. LSD-SLAM:Large-scale direct monocular SLAM[C]//Lecture Notes in Computer Science, Vol.8690. Berlin, Germany:Springer-Verlag, 2014:834-849.
[47] Salas-Moreno R F, Newcombe R A, Strasdat H, et al. SLAM++:Simultaneous localisation and mapping at the level of objects[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2013:1352-1359.
[48] Neubert P, Sunderhauf N, Protzel P. Superpixel-based appearance change prediction for long-term navigation across seasons[J]. Robotics & Autonomous Systems, 2015, 69(1):15-27.
[49] Lowry S M, Milford M J, Wyeth G F. Transforming morning to afternoon using linear regression techniques[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2014:3950-3955.
[50] Biber P, Duckett T. Experimental analysis of sample-based maps for long-term SLAM[J]. International Journal of Robotics Research, 2009, 28(1):20-33.
[51] Manning C D, Raghavan P, Schütze H. Introduction to information retrieval[J]. Journal of the American Society for Information Science & Technology, 2008, 43(3):824-825.
[52] Maddern W, Milford M J, Wyeth G F. CAT-SLAM:Probabilistic localisation and mapping using a continuous appearance-based trajectory[J]. International Journal of Robotics Research, 2012, 31(4):429-451.
[53] Vlassis N, Krose B. Robot environment modeling via principal component regression[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 1999:677-682.
[54] Thrun S, Burgard W, Fox D. A probabilistic approach to concurrent mapping and localization for mobile robots[J]. Autonomous Robots, 1998, 31(1/2/3):29-53.
[55] Ramos F, Upcroft B, Kumar S, et al. A Bayesian approach for place recognition[J]. Robotics and Autonomous Systems, 2012, 60(4):487-497.
[56] Cummins M J, Newman P M. FAB-MAP:Probabilistic localization and mapping in the space of appearance[J]. International Journal of Robotics Research, 2008, 27(6):647-665.
[57] Chow C K, Liu C N. Approximating discrete probability distributions with dependence trees[J]. IEEE Transactions on Information Theory, 1968, 14(3):462-467.
[58] Angeli A, Filliat D, Doncieux S, et al. Fast and incremental method for loop-closure detection using bags of visual words[J]. IEEE Transactions on Robotics, 2008, 24(5):1027-1037.
[59] Filliat D. A visual bag of words method for interactive qualitative localization and mapping[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2011:3921-3926.
[60] Naseer T, Suger B, Ruhnke M, et al. Vision-based Markov localization across large perceptual changes[C]//European Conference on Mobile Robots. Piscataway, USA:IEEE, 2015:1-6.
[61] Naseer T, Suger B, Ruhnke M, et al. Vision-based Markov localization for long-term autonomy[J]. Robotics and Autonomous Systems, 2016, 89:147-157.
[62] Burgard W, Brock O, Stachniss C. Mapping large loops with a single hand-held camera[C]//Robotics:Science and Systems. Cambridge, USA:MIT Press, 2007:297-304.
[63] Wolf J, Burgard W, Burkhardt H. Robust vision-based localization by combining an image-retrieval system with Monte Carlolocalization[J]. IEEE Transactions on Robotics, 2005, 21(2):208-216.
[64] Pupilli M, Calway A. Real-time visual SLAM with resilience to erratic motion[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2006:1244-1249.
[65] Milford M J, Wyeth G F, Prasser D. RatSLAM:A hippocampal model for simultaneous localization and mapping[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2004:403-408.
[66] Ho K L, Newman P M. Detecting loop closure with scene sequences[J]. International Journal of Computer Vision, 2007, 74(3):261-286.
[67] Johns E, Yang G Z. Feature co-occurrence maps:Appearance-based localisation throughout the day[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2013:3212-3218.
[68] Naseer T, Spinello L, Burgard W, et al. Robust visual robot localization across seasons using network flows[C]//AAAI Conference on Artificial Intelligence. Menlo Park, CA:AAAI Press, 2014:2564-2570.
[69] Hansen P, Browning B. Visual place recognition using HMM sequence matching[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2014:4549-4555.
[70] Liu Y, Zhang H. Towards improving the efficiency of sequence-based SLAM[C]//IEEE International Conference on Mechatronics and Automation. Piscataway, USA:IEEE, 2013:1261-1266.
[71] Siam S M, Zhang H. Fast-SeqSLAM:A fast appearance based place recognition algorithm[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:5702-5708.
[72] Milford M J, Wyeth G F. Mapping a suburb with a single camera using a biologically inspired SLAM system[J]. IEEE Transactions on Robotics, 2008, 24(5):1038-1053.
[73] Latif Y, Huang G, Leonard J, et al. An online sparsity-cognizant loop-closure algorithm for visual navigation[C]//Robotics:Science and Systems. Cambridge, USA:MIT Press, 2014.
[74] Bazeille S, Filliat D. Incremental topo-metric SLAM using vision and robot odometry[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2011:4067-4073.
[75] Pepperell E, Corke P, Milford M J. Towards persistent visual navigation using SMART[C/OL]//Australasian Conference on Robotics and Automation. Australia:ARAA, 2013.[2018-01-02]. http://www.araa.asn.au/acra/acra2013/papers/pap131s1-file1.pdf.
[76] Badino H, Huber D, Kanade T. Real-time topometric localization[C]//IEEE International Conference on Robotics and Auto-mation. Piscataway, USA:IEEE, 2012:1635-1642.
[77] Latif Y, Cadena C, Neira J. Robust loop closing over time for pose graph SLAM[J]. International Journal of Robotics Research, 2013, 32(14):1611-1626.
[78] Konolige K, Agrawal M. Frame SLAM:From bundle adjustment to real-time visual mapping[J]. IEEE Transactions on Ro-botics, 2008, 24(5):1066-1077.
[79] Konolige K, Bowman J, Chen J D, et al. View-based maps[J]. International Journal of Robotics Research, 2010, 29(8):941-957.
[80] Sunderhauf N, Protzel P. Towards a robust back-end for pose graph SLAM[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2012:1254-1261.
[81] Agarwal P, Tipaldi G D, Spinello L, et al. Robust map optimization using dynamic covariance scaling[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2013:62-69.
[82] Chen Z, Jacobson A, Sünderhauf N, et al. Deep learning features at scale for visual place recognition[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:3223-3230.
[83] Wang S, Clark R, Wen H, et al. DeepVO:Towards end-to-end visual odometry with deep recurrent convolutional neural networks[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:2043-2050.
[84] Redmon J, Divvala S, Girshick R, et al. You only look once:Unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2016:779-788.
[85] Hou Y, Zhang H, Zhou S. Evaluation of object proposals and ConvNet features for landmark-based visual place recognition[J]. Journal of Intelligent & Robotic Systems, 2017(1):1-16.
[86] Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2016:1529-1537.
[87] Carlone L, Rosen D M, Calafiore G, et al. Lagrangian duality in 3D SLAM:Verification techniques and optimal solutions[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2015:125-132.
[88] Khosoussi K, Huang S, Dissanayake G. Novel insights into the impact of graph structure on SLAM[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2014:2707-2714.
[89] Wang X L, Peng G H, Zhang H. Combining multiple image descriptions for loop closure detection[J/OL]. Journal of Intelligent & Robotic Systems, 2017, (2017-12-11)[2018-01-01]. https://doi.org/10.1007/s10846-017-0755-7.
[90] Yu W, Yang K, Yao H, et al. Exploiting the complementary strengths of multi-layer CNN features for image retrieval[J]. Neurocomputing, 2016, 237:235-241.