An RGB-D SLAM Algorithm Based on Dynamic Coupling and Spatial Data Association
NIU Minyu1,2,3, HUANG Yiqing1,2,3
1. School of Electrical Engineering, Anhui Polytechnic University, Wuhu 241000, China; 2. Anhui Key Laboratory of Electric Drive and Control, Wuhu 241000, China; 3. Key Laboratory of Advanced Perception and Intelligent Control of High-end Equipment, Ministry of Education, Wuhu 241000, China
Abstract:In order to solve the problem of accuracy decline of positioning and mapping of the visual SLAM (simultaneous localization and mapping) algorithm in a dynamic environment, an RGB-D SLAM algorithm is proposed based on dynamic coupling and spatial data association. Firstly, the semantic network is used to obtain pre-processed semantic segmentation images, the edge detection algorithm and adjacent semantic judgment are combined to obtain the complete semantic dynamic objects. Secondly, the initial camera attitude is estimated in the dense direct method module, and wherein the dynamic coupling scores are calculated not only by the traditional dynamic region elimination, but also by the spatial plane consistency and the depth information screening. Further, the map point set is updated in real time by the spatial data association algorithm and the camera pose, and then pose of the camera is optimized by minimizing the reprojection error and the closed-loop optimization process. Finally, the octree dense map is constructed by using the camera pose and the map point set to eliminate all the dynamic region, from plane to space, and the static map is constructed in dynamic environment. According to the test results on TUM data set in a high-dynamic environment, the positioning error of the proposed algorithm is reduced by about 90% compared with that of ORB-SLAM algorithm, and the positioning accuracy and the camera pose estimation accuracy of RGB-D SLAM algorithm are effectively improved by the proposed algorithm.
[1] Davison A J, Reid I D, Molton N D, et al. MonoSLAM: Real-time single camera SLAM[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 1052-1067. [2] Klein G, Murray D. Parallel tracking and mapping for small AR workspaces[C]//6th IEEE and ACM International Symposium on Mixed and Augmented Reality. New York, USA: ACM, 2008: 1-10. [3] Mur-Artal R, Tardos J D.ORB-SLAM2: An open-source SLAM system for monocular, stereo and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. [4] Engel J, Schps T, Cremers D. LSD-SLAM: Large-scale direct monocular SLAM[M]//Lecture Notes in Computer Science, Vol.8690. Berlin, Germany: Springer, 2014: 834-849. [5] Forster C, Zhang Z C, Gassner M, et al. SVO: Semi-direct visual odometry for monocular and multicamera systems[J]. IEEE Transactions on Robotics, 2017, 33(2): 249-265. [6] Engel J, Koltun V, Cremers D. Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 40(3): 611-625. [7] Bescos B, Facil J M, Civera J, et al. DynaSLAM: Tracking, mapping and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4): 4076-4083. [8] He K M, Gkioxari G, Dollar P, et al. Mask R-CNN[C]//IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2017: 2980-2988. [9] Zhang T W, Zhang H Y, Li Y, et al. FlowFusion: Dynamic dense RGB-D SLAM based on optical flow[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2020: 7322-7328. [10] Scona R, Jaimez M, Petillot Y R, et al. StaticFusion: Background reconstruction for dense RGB-D SLAM in dynamic environments[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2018: 3849-3856. [11] Sun D Q, Yang X D, Liu M Y, et al. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 8934-8943. [12] Tan W, Liu H M, Dong Z L, et al. Robust monocular SLAM in dynamic environments[C]//IEEE/ACM International Symposium on Mixed and Augmented Reality. Piscataway, USA: IEEE, 2013: 209-218. [13] Chum O, Matas J. Matching with PROSAC - Progressive sample consensus[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2005: 220-226. [14] Li S L, Lee D H. RGB-D SLAM in dynamic environments using static point weighting[J]. IEEE Robotics and Automation Letters, 2017, 2(4): 2263-2270. [15] Li S L, Lee D H. Fast visual odometry using intensity-assisted iterative closest point[J]. IEEE Robotics and Automation Letters, 2016, 1(2): 992-999. [16] Zhou D F, Fremont V, Quost B, et al. Moving object detection and segmentation in urban environments from a moving plat-form[J]. Image and Vision Computing, 2017, 68: 76-87. [17] Zhou D F, Frémont V, Quost B . Moving objects detection and credal boosting based recognition in urban environments[C]// IEEE Conference on Cybernetics and Intelligent Systems. Piscataway, USA: IEEE, 2013: 24-29. [18] Liu C. Beyond pixels: Exploring new representations and applications for motion analysis[D]. Cambridge, USA: MIT, 2010. [19] Kim D H, Kim J H. Effective background model-based RGB-D dense visual odometry in a dynamic environment[J]. IEEE Transactions on Robotics, 2016, 32(6): 1565-1573. [20] Zhou W G, Peng X D, Wang H J, et al. Nonparametric statistical and clustering based RGB-D dense visual odometry in a dynamic environment[J]. 3D Research, 2019, 10(2). DOI: 10.1007/ s13319-019-0220-4. [21] Azartash H, Lee K R, Nguyen T Q. Visual odometry for RGB-D cameras for dynamic scenes[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, USA: IEEE, 2014: 1280-1284. [22] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(2): 2481-2495. [23] Hornung A, Wurm K M, Bennewitz M, et al. OctoMap: An efficient probabilistic 3D mapping framework based on octrees[J]. Autonomous Robots, 2013, 34(3): 189-206. [24] Yu C, Liu Z X, Liu X J, et al. DS-SLAM: A semantic visual SLAM towards dynamic environments[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA: IEEE, 2018: 1168-1174. [25] Palazzolo E, Behley J, Lottes P, et al. ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA: IEEE, 2019: 7855-7862.