高兴波, 史旭华, 葛群峰, 陈奎烨. 面向动态物体场景的视觉SLAM综述[J]. 机器人, 2021, 43(6): 733-750.DOI: 10.13973/j.cnki.robot.200323.
GAO Xingbo, SHI Xuhua, GE Qunfeng, CHEN Kuiye. A Survey of Visual SLAM for Scenes with Dynamic Objects. ROBOT, 2021, 43(6): 733-750. DOI: 10.13973/j.cnki.robot.200323.
Abstract:Visual SLAM (simultaneous localization and mapping) for scenes with dynamic objects is surveyed as a current research hotspot in robot navigation, automatic driving and other fields. Three research directions of dynamic SLAM are classified according to the different ways of handling the dynamic objects in localization and mapping, including dynamic robust SLAM and static background reconstruction, non-rigid dynamic object tracking and reconstruction, and moving object tracking and reconstruction. The three research directions are reviewed respectively, and the dynamic SLAM approaches combined with deep learning are highlighted. Finally, future development directions of dynamic SLAM are envisioned.
[1] Huang S D, Dissanayake G. A critique of currentdevelopments in simultaneous localization and mapping[J]. InternationalJournal of Advanced Robotic Systems, 2016, 13(5). DOI:10.1177/1729881416669482. [2] Cadena C, Carlone L, Carrillo H, et al. Past, present, and future ofsimultaneous localization and mapping:Toward the robust-perception age[J].IEEE Transactions on Robotics, 2016, 32(6):1309-1332. [3] Younes G, Asmar D, Shammas E, et al. Keyframe-based monocularSLAM:Design, survey, and future directions[J]. Robotics and AutonomousSystems, 2017, 98:67-88. [4] Fuentes-Pacheco J, Ruiz-Ascencio J, Rendón-Mancha J M. Visualsimultaneous localization and mapping:A survey[J]. Artificial IntelligenceReview, 2015, 43:55-81. [5] Taketomi T, Uchiyama H, IkedaS. Visual SLAM algorithms:A survey from 2010 to 2016[J]. IPSJ Transactions onComputer Vision and Applications, 2017, 9. DOI:10.1186/s41074-017-0027-2. [6] Sualeh M, Kim G W. Simultaneouslocalization and mapping in the epoch of semantics:A survey[J].International Journal of Control, Automation, and Systems, 2019, 17:729-742. [7] Huang B C, Zhao J, Liu J. Asurvey of simultaneous localization and mapping with an envision in 6Gwireless networks[DB/OL]. (2020-02-24)[2020-03-04].https://arxiv.org/abs/1909.05214. [8] Klein G, Murray D. Parallel tracking andmapping for small AR workspaces[C]//6th IEEE and ACM InternationalSymposium on Mixed and Augmented Reality. Piscataway, USA:IEEE, 2007:225-234. [9] Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM:A versatile andaccurate monocular SLAM system[J]. IEEE Transactions on Robotics, 2015,31(5):1147-1163. [10] Mur-Artal R, Tardós J D. ORB-SLAM2:An open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactionson Robotics, 2017, 33(5):1255-1262. [11] Newcombe R A, Lovegrove S J, Davison AJ. DTAM:Dense tracking andmapping in real-time[C]//International Conference on Computer Vision.Piscataway, USA:IEEE, 2011:2320-2327. [12] Engel J, Schöps T, Cremers D. LSD-SLAM:Large-scale direct monocular SLAM[M]//Lecture Notes in Computer Science,Vol.8690. Berlin, Germany:Springer, 2014:834-849. [13] Forster C, Pizzoli M, Scaramuzza D. SVO:Fast semi-directmonocular visualodometry[C]//IEEEInternational Conference on Robotics and Automation. Piscataway, USA:IEEE,2014:15-22. [14] Engel J, Koltun V, Cremers D. Direct sparseodometry[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence, 2018, 40(3):611-625. [15] Saputra M R U, Markham A, Trigoni N. Visual SLAM and structure from motionin dynamic environments:A survey[J]. ACM Computing Surveys, 2018, 51(2).DOI:10.1145/3177853. [16] 刘强,段富海,桑勇,等. 复杂环境下视觉SLAM闭环检测方法综述[J].机器人,2019,41(1):112-123,136. Liu Q, Duan F H, Sang Y, et al. A survey of loop-closure detection method ofvisual SLAM in complex environments[J]. Robot, 2019, 41(1):112-123,136. [17] Merrill N, Huang G Q. CALC2.0:Combiningappearance, semantic and geometric information for robust and efficient visual loopclosure[C]//IEEE/RSJInternational Conference on Intelligent Robots and Systems. Piscataway,USA:IEEE, 2019:4554-4561. [18] Fischler M A, Bolles R C. Random sample consensus:Aparadigm for model fitting with applications to image analysis and automatedcartography[J]. Communications of the ACM, 1981, 24(6).DOI:10.1145/358669.358692. [19] Hartley R, Zisserman A. Multiple view geometry incomputer vision[M]. 2nd ed. Cambridge, UK:Cambridge University Press,2004. [20] Kundu A, Krishna K M, Sivaswamy J. Moving object detection bymulti-view geometric techniques from a single camera mountedrobot[C]//IEEE/RSJ International Conference on Intelligent Robots andSystems. Piscataway, USA:IEEE, 2009:4306-4312. [21] Zou D P, Tan P. CoSLAM:Collaborative visual SLAM in dynamic environments[J]. IEEE Transactions onPattern Analysis and Machine Intelligence, 2013, 35(2):354-366. [22] Tan W, Liu H M, Dong Z L, et al. Robust monocular SLAM indynamic environments[C]//IEEE International Symposium on Mixed andAugmented Reality. Piscataway, USA:IEEE, 2013:209-218. [23] Sun Y X, Liu M, Meng M Q H. Improving RGB-D SLAM indynamic environments:A motion removal approach[J]. Robotics and AutonomousSystems, 2017, 89:110-122. [24] Kitt B, Moosmann F, Stiller C. Moving on to dynamicenvironments:Visual odometry using feature classification[C]//IEEE/RSJInternational Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2010:5551-5556. [25] Keller M, Lefloch D, Lambers M, et al. Real-time 3D reconstruction indynamic scenes using point-based fusion[C]//International Conference on 3DVision. Piscataway, USA:IEEE, 2013. DOI:10.1109/3DV.2013.9. [26] Li S L, Lee D. RGB-DSLAM in dynamic environments using static point weighting[J]. IEEE Roboticsand Automation Letters, 2017, 2(4):2263-2270. [27] Dai W C, Zhang Y, Li P, et al. RGB-D SLAM in dynamicenvironments using points correlations[J/OL].IEEE Transactions on Pattern Analysis and Machine Intelligence, (2020-07-21)[2020-08-05]. DOI:10.1109/TPAMI.2020.3010942. [28] Palazzolo E, Behley J, Lottes P, et al. ReFusion:3Dreconstruction in dynamic environments for RGB-D cameras exploitingresiduals[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems.Piscataway, USA:IEEE, 2019:7855-7862. [29] Canelhas D R, Stoyanov T, Lilienthal A J. SDF tracker:A parallelalgorithm for on-line pose estimation and scene reconstruction from depthimages[C]//IEEE/RSJ International Conference on Intelligent Robots andSystems. Piscataway, USA:IEEE, 2013:3671-3676. [30] Nießner M, Zollhöfer M, Izadi S, et al. Real-time 3D reconstructionat scale using voxel hashing[J]. ACM Transactions on Graphics, 2013, 32(6).DOI:10.1145/2508363.2508374. [31] Scona R, Jaimez M, Petillot Y R, et al. StaticFusion:Backgroundreconstruction for dense RGB-D SLAM in dynamic environments[C]//IEEEInternational Conference on Robotics and Automation. Piscataway, USA:IEEE,2018:3849-3856. [32] Whelan T, Leutenegger S, Salas-Moreno R F, et al. Elastic-Fusion:Dense SLAM without a pose graph[C/OL]//Proceedings of Robotics:Science and Systems. 2015.DOI:10.15607/RSS.2015.XI.001. [33] Horn B K P, Schunck B G. Determining opticalflow[J]. Artificial Intelligence, 1981, 17(1-3):185-203. [34] Klappstein J, Vaudrey T, Rabe C, et al. Movingobject segmentation using optical flow and depthinformation[M]//LectureNotes in Computer Science, Vol.5414. Berlin, Germany:Springer, 2009:611-623. [35] Derome M, Plyer A, Sanfourche M, et al. Moving object detection inreal-time using stereo from a mobile platform[J]. Unmanned Systems, 2015,3(4):253-266. [36] Derome M, Plyer A, Sanfourche M, et al. Real-time mobile objectdetection using stereo[C]//13th International Conference on Control,Automation, Robotics and Vision. Piscataway, USA:IEEE, 2014:1021-1026. [37] Fang Y Q, Dai B. Animproved moving target detecting and tracking based on optical flowtechnique and Kalman filter[C]//4th International Conference on ComputerScience & Education. Piscataway, USA:IEEE, 2009:1197-1202. [38] Wang Y B, Huang S D. Towards dense movingobject segmentation based robust dense RGB-D SLAM in dynamicscenarios[C]//13th International Conference on Control, Automation, Roboticsand Vision. Piscataway, USA:IEEE, 2014:1841-1846. [39] Alcantarilla P F, Yebes J J, Almazán J, et al. On combining visual SLAMand dense scene flow to increase the robustness of localization and mappingin dynamic environments[C]//IEEE International Conference on Robotics andAutomation. Piscataway, USA:IEEE, 2012:1290-1297. [40] Bakkay M C, Arafa M, Zagrouba E. Dense3D SLAM in dynamic scenes using Kinect[M]//Lecture Notes in ComputerScience, Vol.9117. Berlin, Germany:Springer, 2015:121-129. [41] Jaimez M, Kerl C, Gonzalez-Jimenez J, et al. Fast odometry and sceneflow from RGB-D cameras based on geometric clustering[C]//IEEEInternational Conference on Robotics and Automation. Piscataway, USA:IEEE,2017:3992-3999. [42] Zhang T W, Zhang H Y, Li Y, et al. FlowFusion:Dynamicdense RGB-D SLAM based on optical flow[C]//IEEE International Conference onRobotics and Automation. Piscataway, USA:IEEE, 2020:7322-7328. [43] Sun D Q, Yang X D, Liu M Y, et al. PWC-Net:CNNs for opticalflow using pyramid, warping, and costvolume[C]//IEEE/CVFConference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2018:8934-8943. [44] Yu C, Liu Z X, Liu X J, et al. DS-SLAM:A semantic visualSLAM towards dynamic environments[C]//IEEE/RSJ International Conference onIntelligent Robots and Systems. Piscataway, USA:IEEE, 2018:1168-1174. [45] Badrinarayanan V, Kendall A, Cipolla R. SegNet:A deepconvolutional encoder-decoder architecture for image segmentation[J]. IEEETransactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495. [46] Sturm J, Engelhard N, Endres F, et al. A benchmark for theevaluation of RGB-D SLAM systems[C]//IEEE/RSJ International Conference onIntelligent Robots and Systems. Piscataway, USA:IEEE, 2012:573-580. [47] Bescos B, Fácil J M, Civera J, et al. DynaSLAM:Tracking, mapping,and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters,2018, 3(4):4076-4083. [48] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[J]. IEEETransactions on Pattern Analysis and Machine Intelligence, 2020, 42(2):386-397. [49] Zhong F W, Wang S, Zhang Z Q, et al. Detect-SLAM:Making objectdetection and SLAM mutually beneficial[C]//IEEE Winter Conference onApplications of Computer Vision. Piscataway, USA:IEEE, 2018:1001-1010. [50] Xiao L H, Wang J G, Qiu X S, et al. Dynamic-SLAM:Semanticmonocular visual localization and mapping based on deep learning in dynamicenvironment[J]. Robotics and Autonomous Systems, 2019, 117:1-16. [51] Liu W, Anguelov D, Erhan D, et al. SSD:Single shot multibox detector[M]//Lecture Notes in Computer Science,Vol.9905. Berlin, Germany:Springer, 2016:21-37. [52] Geiger A, Lenz P, Urtasun R. Are we ready for autonomousdriving? The KITTI vision benchmark suite[C]//IEEE Conference on ComputerVision and Pattern Recognition. Piscataway, USA:IEEE, 2012:3354-3361. [53] Wang Z M, Zhang Q, Li J S, et al. A computationally efficientsemantic SLAM solution for dynamic scenes[J]. Remote Sensing, 2019, 11(11).DOI:10.3390/rs11111363. [54] Redmon J, Farhadi A. YOLOv3:An incremental improvement[DB/OL]. (2018-04-08)[2019-06-03].https://arxiv.org/abs/1804.02767v1. [55] Zhang T W, Nakamura Y. PoseFusion:Dense RGB-D SLAM in dynamic human environments[M]//Springer Proceedings inAdvanced Robotics, Vol.11. Berlin, Germany:Springer, 2018:772-780. [56] Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2D poseestimation using part affinity fields[C]//IEEE Conference on ComputerVision and Pattern Recognition. Piscataway, USA:IEEE, 2017:1302-1310. [57] Whelan T, Salas-Moreno R F, Glocker B, et al. ElasticFusion:Real-timedense SLAM and light source estimation[J]. International Journal ofRobotics Research, 2016, 35(14):1697-1716. [58] Zhang T W, Uchiyama E, Nakamura Y. Dense RGB-D SLAM forhumanoid robots in the dynamic humans environment[C]//IEEE-RAS 18thInternational Conference on Humanoid Robots. Piscataway, USA:IEEE, 2018:270-276. [59] Newcombe R A, Izadi S, Hilliges O, et al. KinectFusion:Real-timedense surface mapping and tracking[C]//10th IEEE International Symposium onMixed and Augmented Reality. Piscataway, USA:IEEE, 2011:127-136. [60] Newcombe R A, Fox D, Seitz S M. DynamicFusion:Reconstruction and tracking of non-rigid scenes in real-time[C]//IEEEConference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2015:343-352. [61] Innmann M, Zollhöfer M, Nießner M, et al. VolumeDeform:Real-time volumetric non-rigid reconstruction[M]//Lecture Notes in ComputerScience, Vol.9912. Berlin, Germany:Springer, 2016:362-379. [62] Guo K W, Xu F, Yu T, et al. Real-time geometry, albedo, andmotion reconstruction using a single RGB-D camera[J]. ACM Transactions onGraphics, 2017, 36(3). DOI:10.1145/3083722. [63] Slavcheva M, Baust M, Cremers D, et al. KillingFusion:Non-rigid 3Dreconstruction without correspondences[C]//IEEE Conference on ComputerVision and Pattern Recognition. Piscataway, USA:IEEE, 2017:5474-5483. [64] Dou M S, Khamis S, Degtyarev Y, et al. Fusion4D:Real-timeperformance capture of challenging scenes[J]. ACM Transactions on Graphics,2016, 35(4). DOI:10.1145/2897824.2925969. [65] Dou M S, Davidson P, Fanello S R, et al. Motion2Fusion:Real-timevolumetric performance capture[J]. ACM Transactions on Graphics, 2017,36(6). DOI:10.1145/3130800.3130801. [66] Sumner R W, Schmid J, Pauly M. Embedded deformation forshape manipulation[J]. ACM Transactions on Graphics, 2007, 26(3).DOI:10.1145/1276377.1276478. [67] Yu T, Guo K W, Xu F, et al. BodyFusion:Real-time captureof human motion and surface geometry using a single depth camera[C]//IEEEInternational Conference on Computer Vision. Piscataway, USA:IEEE, 2017:910-919. [68] Yu T, Zheng Z R, Guo K W, et al. DoubleFusion:Real-timecapture of human performances with inner body shapes from a single depthsensor[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway, USA:IEEE, 2018:7287-7296. [69] Loper M, Mahmood N, Romero J, et al. SMPL:A skinned multi-personlinear model[J]. ACM Transactions on Graphics, 2015, 34(6). DOI:10.1145/2816795.2818013. [70] Gao W, Tedrake R. SurfelWarp:Efficient non-volumetric single view dynamic reconstruction[DB/OL].(2019-04-30)[2019-06-03]. https://arxiv.org/abs/1904.13073. [71] Breitenstein M D, Reichlin F, Leibe B, et al. Online multipersontracking-by-detection from a single, uncalibrated camera[J]. IEEETransactions on Pattern Analysis and Machine Intelligence, 2011, 33(9):1820-1833. [72] Wangsiripitak S, Murray D W. Avoiding moving outliersin visual SLAM by tracking moving objects[C]//IEEE International Conferenceon Robotics and Automation. Piscataway, USA:IEEE, 2009:375-380. [73] Davison A J, Reid I D, Molton N D, et al. MonoSLAM:Real-time singlecamera SLAM[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence, 2007, 29(6):1052-1067. [74] Migliore D, Rigamonti R, Marzorati D, et al. Usea single camera for simultaneous localization and mapping with mobile objecttracking in dynamic environments[C]//ICRA Workshop on Safe Navigation inOpen and Dynamic Environments:Application to Autonomous Vehicles.Piscataway, USA:IEEE, 2009:12-17. [75] Heuel S, Forstner W. Matching, reconstructing andgrouping 3D lines from multiple views using uncertain projectivegeometry[C]//IEEE Computer Society Conference on Computer Vision andPattern Recognition. Piscataway, USA:IEEE, 2001. DOI:10.1109/CVPR.2001.991006. [76] Sola J. Towards visuallocalization, mapping and moving objects tracking by a mobile robot:Ageometric and probabilistic approach[D]. Toulouse, France:Institut NationalPolytechnique de Toulouse, 2007. [77] Lin K H, Wang C C. Stereo-based simultaneouslocalization, mapping and moving object tracking[C]//IEEE/RSJ InternationalConference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2010:3975-3980. [78] Kundu A, Krishna K M, Jawahar C V. Realtime multibody visualSLAM with a smoothly moving monocular camera[C]//International Conferenceon Computer Vision. Piscataway, USA:IEEE, 2011:2080-2087. [79] Lee K H, Hwang J N, Okapal G, et al. Driving recorder basedon-road pedestrian tracking using visual SLAM and constrainedmultiple-kernel[C]//17th International IEEE Conference on IntelligentTransportation Systems. Piscataway, USA:IEEE, 2014:2629-2635. [80] Lee K H, Hwang J N, Okopal G, et al. Ground-moving-platform-basedhuman tracking using visual SLAM and constrained multiple kernels[J]. IEEETransactions on Intelligent Transportation Systems, 2016, 17(12):3602-3612. [81] Judd K M, Gammell J D, Newman P. Multimotion visual odometry(MVO):Simultaneous estimation of camera and third-partymotions[C]//IEEE/RSJ International Conference on Intelligent Robots andSystems. Piscataway, USA:IEEE, 2018:3949-3956. [82] Amayo P, Piniés P, Paz L M, et al. Geometric multi-model fittingwith a convex relaxation algorithm[C]//IEEE/CVF Conference on ComputerVision and Pattern Recognition. Piscataway, USA:IEEE, 2018:8138-8146. [83] Li P L, Qin T, Shen S J. Stereovision-based semantic 3D object and ego-motion tracking for autonomousdriving[M]//Lecture Notes in Computer Science, Vol.11206. Berlin, Germany:Springer, 2018:664-679. [84] Ren S Q, He K M, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[C]//28th International Conference on Neural Information Processing Systems, Vol.1. New York, USA:ACM, 2015:91-99. [85] Vincent J, Labbé M, Lauzon J S, et al. Dynamic objecttracking and masking for visual SLAM[C]//IEEE/RSJInternational Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2020:4974-4979. [86] Bolya D, Zhou C, Xiao F Y, et al. YOLACT:Real-time instancesegmentation[C]//IEEE/CVF International Conference on Computer Vision.Piscataway, USA:IEEE, 2019:9156-9165. [87] Bolya D, Zhou C, Xiao F Y, et al. YOLACT++:Betterreal-time instance segmentation[J]. IEEE Transactions on Pattern Analysisand Machine Intelligence, (2020-08-05)[2020-08-18]. DOI:10.1109/TPAMI.2020.3014297. [88] Wang C C, Thorpe C, Thrun S, et al. Simultaneous localization,mapping and moving object tracking[J]. International Journal ofRobotics Research, 2007, 26(9):889-916. [89] Rünz M, Agapito L. Co-Fusion:Real-timesegmentation, tracking and fusion of multiple objects[C]//IEEEInternational Conference on Robotics and Automation. Piscataway, USA:IEEE,2017:4471-4478. [90] Rünz M, Buffier M, Agapito L. MaskFusion:Real-timerecognition, tracking and reconstruction of multiple movingobjects [C]//IEEE International Symposium on Mixed and Augmented Reality.Piscataway, USA:IEEE, 2018:10-20. [91] Xu B B, Li W B, Tzoumanikas D, et al. MID-Fusion:Octree-basedobject-level multi-instance dynamic SLAM[C]//IEEE International Conference onRobotics and Automation. Piscataway, USA:IEEE, 2019:5231-5237. [92] Hachiuma R, Pirchheim C, Schmalstieg D, et al. DetectFusion:Detecting andsegmenting both known and unknown dynamic objects in real-time SLAM[DB/OL].(2019-07-22)[2019-10-04]. https://arxiv.org/abs/1907.09127. [93] Pinheiro P O, Lin T Y, Collobert R, et al. Learningto refine object segments[M]//Lecture Notes in Computer Science, Vol.9905.Berlin, Germany:Springer, 2016:75-91. [94] Tateno K, Tombari F, Navab N. Real-time and scalableincremental segmentation on dense SLAM[C]//IEEE/RSJ InternationalConference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2015:4465-4472. [95] Lin T Y, Maire M, Belongie S, et al. MicrosoftCOCO:Common objects in context[M]//Lecture Notes in Computer Science,Vol.8693. Berlin, Germany:Springer, 2014:740-755. [96] Everingham M, van Gool L, Williams C K I, et al. ThePASCAL visual object classes (VOC) challenge[J]. InternationalJournal of Computer Vision, 2010, 88:303-338. [97] Bârsan I A, Liu P D, Pollefeys M, et al. Robust dense mapping forlarge-scale dynamic environments[C]//IEEE International Conference onRobotics and Automation. Piscataway, USA:IEEE, 2018:7510-7517. [98] Dai J F, He K M, Sun J. Instance-aware semanticsegmentation via multi-task network cascades[C]//IEEE Conference onComputer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2016:3150-3158. [99] Kähler O, Prisacariu V A, Ren C Y, et al. Very high frame ratevolumetric integration of depth images on mobile devices[J]. IEEETransactions on Visualization and Computer Graphics, 2015, 21(11):1241-1250. [100] Henein M, Kennedy G, Mahony R, et al. Exploitingrigid body motion for SLAM in dynamic environments[A/OL].[2020-08-14].https://natanaso.github.io/rcw-icra18/assets/ref/ICRA-MRP18_paper_13.pdf [101] Zhang J, Henein M, Mahony R, et al. VDO-SLAM:A visual dynamicobject-aware SLAM system[DB/OL]. (2020-05-22)[2020-08-20].https://arxiv.org/abs/2005.11052. [102] Yamaguchi K, Mcallester D, Urtasun R. Efficientjoint segmentation, occlusion labeling, stereo and flowestimation[M]//Lecture Notes in Computer Science, Vol.8693. Berlin,Germany:Springer, 2014:756-771. [103] Godard C, Mac Aodha O, Firman M, et al. Digging into self-supervisedmonocular depth estimation[C]//IEEE/CVF International Conference onComputer Vision. Piscataway, USA:IEEE, 2019:3827-3837. [104] Henein M, Zhang J, Mahony R, et al. Dynamic SLAM:The needfor speed[C]//IEEE International Conference on Robotics and Automation.Piscataway, USA:IEEE, 2020:2123-2129. [105] Bescos B, Campos C, Tardós J D, et al. DynaSLAM II:Tightly-coupledmulti-object tracking and SLAM[DB/OL]. (2020-10-15)[2020-10-25].https://arxiv.org/abs/2010.07820 v1. [106] Reddy N D, Singhal P, Chari V, et al. Dynamic body VSLAM withsemantic constraints[C]//IEEE/RSJ International Conference on IntelligentRobots and Systems. Piscataway, USA:IEEE, 2015:1897-1904. [107] Wang C J, Luo B, Zhang Y, et al. DymSLAM:4D dynamic scenereconstruction based on geometrical motion segmentation[J]. IEEE Roboticsand Automation Letters, 2021, 6(2):550-557. [108] Zhang Y, Luo B, Zhang L P. Permutation preference basedalternate sampling and clustering for motion segmentation[J]. IEEE SignalProcessing Letters, 2018, 25(3):432-436. [109] Bowman S L, Atanasov N, Daniilidis K, et al. Probabilistic dataassociation for semantic SLAM[C]//IEEE International Conference on Roboticsand Automation. Piscataway, USA:IEEE, 2017:1722-1729. [110] Cui L Y, Ma C W. SOF-SLAM:A semanticvisual SLAM for dynamic environments[J]. IEEE Access, 2019, 7:166528-166539. [111] Li Y, Zhang T W, Nakamura Y, et al. SplitFusion:Simultaneous tracking and mapping for non-rigid scenes[C]//IEEE/RSJInternational Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2021:5128-5134. [112] Liang H J, Sanket N J, Fermüller C, et al. SalientDSO:Bringingattention to direct sparse odometry[J]. IEEE Transactions on AutomationScience and Engineering, 2019, 16(4):1619-1626. [113] Ganti P, Waslander S L. Visual SLAM with networkuncertainty informed feature selection[DB/OL]. (2019-08-26)[2020-08-14].https://arxiv.org/abs/1811.11946v1. [114] Lianos K N, Schonberger J L, Pollefeys M, et al. VSO:Visual semantic odometry[M]//Lecture Notes in Computer Science, Vol.11208.Berlin, Germany:Springer, 2018:246-263. [115] Stenborg E, Toft C, Hammarstrand L. Long-term visuallocalization using semantically segmented images[C]//IEEE InternationalConference on Robotics and Automation. Piscataway, USA:IEEE, 2018:6484-6490. [116] Gawel A, Don C D, Siegwart R, et al. X-View:Graph-based semanticmulti-view localization[J]. IEEE Robotics and Automation Letters, 2018,3(3):1687-1694. [117] Salas-Moreno R F, Newcombe R A, Strasdat H, et al. SLAM++:Simultaneouslocalisation and mapping at the level of objects[C]//IEEE Conference onComputer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2013:1352-1359. [118] Yang S C, Scherer S. CubeSLAM:Monocular 3-D object SLAM[J]. IEEE Transactions on Robotics, 2019, 35(4):925-938. [119] Nicholson L, Milford M, Sünderhauf N. QuadricSLAM:Dual quadricsfrom object detections as landmarks in object-oriented SLAM[J]. IEEERobotics and Automation Letters, 2019, 4(1):1-8. [120] Rebecq H, Horstschaefer T, Gallego G, et al. EVO:A geometric approach toevent-based 6-DOF parallel tracking and mapping in real time[J]. IEEERobotics and Automation Letters, 2017, 2(2):593-600. [121] Rebecq H, Gallego G, Scaramuzza D. EMVS:Event-based multi-viewstereo[C]//British Machine Vision Conference. Guildford, UK:BMVA Press,2016. DOI:10.5244/C.30.63. [122] Johnson J, Hariharan B, van der Maaten L, et al. CLEVR:A diagnostic dataset forcompositional language and elementary visual reasoning[C]//IEEE Conferenceon Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2017:1988-1997. [123] Lampert C H, Nickisch H, Harmeling S. Attribute-based classificationfor zero-shot learning of object categories[J]. IEEE Transactions onPattern Analysis and Machine Intelligence, 2014, 36(3):453-465. [124] Malinowski M, Fritz M. A multi-world approachto question answering about real-world scenes based on uncertain input[C]//27th International Conference on Neural Information Processing Systems, Vol.1. New York, USA:ACM, 2014:1682-1690.