Loop Closure Detection Based on Local Semantic Topology for Visual SLAM System
ZHANG Kuojia1, ZHANG Yunzhou1,2, Lü Guanghao1, GONG Yiqun1
1. Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110016, China;
2. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
张括嘉, 张云洲, 吕光浩, 龚益群. 基于局部语义拓扑图的视觉SLAM闭环检测[J]. 机器人, 2019, 41(5): 649-659.DOI: 10.13973/j.cnki.robot.190005.
ZHANG Kuojia, ZHANG Yunzhou, Lü Guanghao, GONG Yiqun. Loop Closure Detection Based on Local Semantic Topology for Visual SLAM System. ROBOT, 2019, 41(5): 649-659. DOI: 10.13973/j.cnki.robot.190005.
摘要针对视觉SLAM(同步定位与地图创建)中现有的闭环检测方法容易产生假阳性检测的问题,利用YOLOv3目标检测算法获取场景中的语义信息,以DBSCAN(density-based spatial clustering of application with noise)算法修正错误检测和遗漏检测,构建语义节点,对关键帧形成局部语义拓扑图.利用图像特征和目标类别信息进行语义节点匹配,计算不同语义拓扑图中对应边的变换关系,得到关键帧之间的相似度,并根据连续关键帧的相似度变化情况进行闭环的判断.在公开数据集上的实验表明,目标聚类有效地提高了室内场景下的闭环检测准确性.与单纯利用传统视觉特征的算法相比,本文算法能够获得更加准确的闭环检测结果.
Abstract:Aiming at the problem of false positive detection caused by current loop closure detection methods in visual SLAM (simultaneous localization and mapping), YOLOv3 (you look only once v3) object detection algorithm is adopted to obtain semantic information in scenes. Wrong and missing object detection are corrected by DBSCAN (density-based spatial clustering of application with noise) algorithm to create semantic nodes, which are then used to construct local semantic topology for a keyframe. After matching semantic nodes based on visual features and classification information of objects, transforming relationship can be computed for corresponding edges in different semantic topologies, which can obtain a similarity score. Judgement of loop closure is performed according to the changes of similarities between consequent keyframes. Experiments on benchmark datasets prove that object clustering effectively improves the accuracy of loop closure detection in indoor scenes. Compared with algorithms which are barely based on traditional visual features, the proposed algorithm can achieve loop closure detection with a higher accuracy.
[1] 刘国忠,胡钊政.基于SURF和ORB全局特征的快速闭环检测[J].机器人,2017,39(1):36-45. Liu G Z, Hu Z Z. Fast loop closure detection based on holistic features from SURF and ORB[J]. Robot, 2017, 39(1):36-45.
[2] Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM:A versatile and accurate monocular SLAM system[J]. IEEE Transactions on Robotics, 2015, 31(5):1147-1163.
[3] 赵洋,刘国良,田国会,等.基于深度学习的视觉SLAM综述[J].机器人,2017,39(6):889-896. Zhao Y, Liu G L, Tian G H, et al. A survey of visual SLAM based on deep learning[J]. Robot, 2017, 39(6):889-896.
[4] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.
[5] Bay H, Tuytelaars T, van Gool L. SURF:Speeded up robust features[C]//9th European Conference on Computer Vision. Berlin, Germany:Springer, 2006:404-417.
[6] Rublee E, Rabaud V, Konolige K, et al. ORB:An efficient alternative to SIFT or SURF[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2011:2564-2571.
[7] Mur-Artal R, Tardós J D. ORB-SLAM2:An open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5):1255-1262.
[8] Gao X, Zhang T. Unsupervised learning to detect loops using deep neural networks for visual SLAM system[J]. Autonomous Robots, 2017, 41(1):1-18.
[9] Cummins M, Newman P. FAB-MAP:Probabilistic localization and mapping in the space of appearance[J]. International Journal of Robotics Research, 2008, 27(6):647-665.
[10] Cascianelli S, Costante G, Bellocchio E, et al. Robust visual semi-semantic loop closure detection by a covisibility graph and CNN features[J]. Robotics and Autonomous Systems, 2017, 92:53-65.
[11] Zitnick C L, Dollár P. Edge boxes:Locating object proposals from edges[C]//13th European Conference on Computer Vision. Cham, Switzerland:Springer, 2014:391-405.
[12] Hou Y, Zhang H, Zhou S L. Convolutional neural network-based image representation for visual loop closure detection[C]//IEEE International Conference on Information and Automation. Piscataway, USA:IEEE, 2015:2238-2245.
[13] DeTone D, Malisiewicz T, Rabinovich A. Toward geometric deep SLAM[EB/OL]. (2017-07-24)[2019-01-01]. https://arxiv.org/abs/1707.07410.
[14] Li Q, Li K, You X, et al. Place recognition based on deep feature and adaptive weighting of similarity matrix[J]. Neurocomputing, 2016, 199:114-127.
[15] Sünderhauf N, Pham T T, Latif Y, et al. Meaningful maps with object-oriented semantic mapping[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2017:5079-5085.
[16] Ren S Q, He K M, Girshick R, et al. Faster R-CNN:Towardsreal-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[17] Redmon J, Farhadi A. YOLOv3:An incremental improvement[EB/OL]. (2018-04-08)[2019-01-01]. https://arxiv.org/abs/1804.02767.
[18] Redmon J, Farhadi A. YOLO9000:Better, faster, stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2017:6517-6525.
[19] Liu W, Anguelov D, Erhan D, et al. SSD:Single shot multibox detector[C]//14th European Conference on Computer Vision. Cham, Switzerland:Springer, 2016:21-37.
[20] Bowman S L, Atanasov N, Daniilidis K, et al. Probabilistic dataassociation for semantic SLAM[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:1722-1729.
[21] Felzenszwalb P F, Girshick R B, McAllester D, et al. Objectdetection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645.
[22] Mu B, Liu S Y, Paull L, et al. SLAM with objects using a nonparametric pose graph[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2016:4602-4609.
[23] Nicholson L, Milford M, Sünderhauf N. QuadricSLAM:Constrained dual quadrics from object detections as landmarks in semantic SLAM[EB/OL]. (2018-08-16)[2019-01-01]. https://arxiv.org/abs/1804.04011.
[24] Finman R, Paull L, Leonard J J. Toward object-based place recognition in dense RGB-D maps[C]//ICRA Workshop on Visual Place Recognition in Changing Environments. 2015. https://people.csail.mit.edu/lpaull/publications/Finman_ICRAW_2015.pdf.
[25] Whelan T, Johannsson H, Kaess M, et al. Robust real-time visual odometry for dense RGB-D mapping[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2013:5724-5731.
[26] Choudhary S, Trevor A J B, Christensen H I, et al. SLAM with object discovery, modeling and mapping[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2014:1018-1025.
[27] Bernuy F, Ruiz del Solar J. Semantic mapping of large-scale outdoor scenes for autonomous off-road driving[C]//IEEE International Conference on Computer Vision Workshops. Piscataway, USA:IEEE, 2015:124-130.
[28] Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]//2nd International Conference on Knowledge Discovery and Data Mining. Palo Alto, USA:AAAI, 1996:226-231.
[29] Hartley R, Zisserman A. Multiple view geometry in computer vision[M]. Cambridge, UK:Cambridge University Press, 2004.
[30] Sturm J, Engelhard N, Endres F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2012:573-580.
[31] 刘强,段富海,桑勇,等.复杂环境下视觉SLAM闭环检测方法综述[J].机器人,2019,41(1):112-123,136. Liu Q, Duan F H, Sang Y, et al. A survey of loop-closure detection method of visual SLAM in complex environments[J]. Robot, 2019, 41(1):112-123,136.
[32] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO:Common objects in context[C]//European Conference on Computer Vision. Cham, Switzerland:Springer, 2014:740-755.
[33] Glover A, Maddern W, Warren M, et al. OpenFABMAP:An open source toolbox for appearance-based loop closure detection[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2012:4730-4735.
[34] Gálvez-López D, Tardós J D. Bags of binary words for fast place recognition in image sequences[J]. IEEE Transactions on Robotics, 2012, 28(5):1188-1197.