Abstract：Considering the disadvantages of semi-direct monocular visual odometry which lacks scale factor and shows poor robustness in fast motion, a semi-direct monocular visual odometry based on inertial fusion is designed. By using IMU (inertial measurement unit) information to make up for the deficiencies of visual odometry, the tracking accuracy and the system robustness can be effectively improved. Through the joint initialization of visual information and inertial measurement, the environmental scale can be accurately recovered. In order to improve the robustness of motion tracking, an IMU-weighted prior model is proposed. IMU state estimation is obtained by preintegration, the weight coefficient is adjusted according to the IMU prior error, and then the weighted value of IMU state is used to provide accurate initial estimation for the front-end. A tightly coupled graph optimization model is constructed in the back-end, which combines inertia, vision and 3D map point for joint optimization. Using common-view relationships as constraints in sliding window, it can improve the optimization efficiency and accuracy while eliminating the local cumulative error. The experimental results show that the prior model is better than both the uniform motion model and the IMU prior model, and the single frame prior error is less than 1 cm. By improving the back-end method, the calculation efficiency is increased by 0.52 times, and both the trajectory accuracy and the optimization stability are improved at the same time. Experimental results on the public dataset EuRoC demonstrate that the proposed algorithm outperforms the open keyframe-based visual inertial SLAM (OKVIS) algorithm, and the root mean square error of trajectory is reduced to 1/3 compared with the original visual odometry.
 Royer E, Lhuillier M, Dhome M, et al. Monocular vision for mobile robot localization and autonomous navigation[J]. International Journal of Computer Vision, 2007, 74(3):237-260.  Moreno-Armendariz M A, Calvo H. Visual SLAM and obstacle avoidance in real time for mobile robots navigation[C]//IEEE International Conference on Mechatronics, Electronics and Automotive Engineering. Piscataway, USA:IEEE, 2014:44-49.  Taketomi T, Uchiyama H, Ikeda S. Visual SLAM algorithms:A survey from 2010 to 2016[J]. IPSJ Transactions on Computer Vision and Applications, 2017, 9(1):16-26.  Klein G, Murray D. Parallel tracking and mapping for small AR workspaces[C]//6th IEEE and ACM International Symposium on Mixed and Augmented Reality. Piscataway, USA:IEEE, 2007:225-234.  Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM:A versatile and accurate monocular SLAM system[J]. IEEE Transactions on Robotics, 2015, 31(5):1147-1163.  Forster C, Pizzoli M, Scaramuzza D. SVO:Fast semi-direct monocular visual odometry[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2014:15-22.  Forster C, Zhang Z, Gassner M, et al. SVO:Semidirect visual odometry for monocular and multicamera systems[J]. IEEE Transactions on Robotics, 2017, 33(2):249-265.  Engel J, Schöps T, Cremers D. LSD-SLAM:large-scale direct monocular SLAM[C]//13th European Conference on Computer Vision. Berlin, Germany:Springer, 2014:834-849.  Engel J, Koltun V, Cremers D. Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3):611-625.  Kelly J, Sukhatme G S. Visual-inertial simultaneous localization, mapping and sensor-to-sensor self-calibration[C]//IEEE International Symposium on Computational Intelligence in Robotics and Automation. Piscataway, USA:IEEE, 2009:360-368.  Lupton T, Sukkarieh S. Visual-inertial-aided navigation for high-dynamic motion in built environments without initial conditions[J]. IEEE Transactions on Robotics, 2012, 28(1):61-76.  Forster C, Carlone L, Dellaert F, et al. On-manifold preintegration for real-time visual-inertial odometry[J]. IEEE Transac-tions on Robotics, 2017, 30(1):1-21.  Mur-Artal R, Tardos J D. Visual-inertial monocular SLAM with map reuse[J]. IEEE Robotics and Automation Letters, 2017, 2(2):796-803.  Qin T, Li P L, Shen S J. VINS-Mono:A robust and versatile monocular visual-inertial state estimator[J]. IEEE Transactions on Robotics, 2018, 34(4):1004-1020.  Liu Y, Chen Z, Zheng W J, et al. Monocular visual-inertial SLAM:Continuous preintegration and reliable initialization[J]. Sensors, 2017, 17(11):2613-2637.  Jones E S, Soatto S. Visual-inertial navigation, mapping and localization:A scalable real-time causal approach[J]. International Journal of Robotics Research, 2011, 30(4):407-430.  Cadena C, Carlone L, Carrillo H, et al. Past, present, and future of simultaneous localization and mapping:Toward the robust-perception age[J]. IEEE Transactions on Robotics, 2016, 32(6):1309-1332.  Leutenegger S, Lynen S, Bosse M, et al. Keyframe-based visual-inertial odometry using nonlinear optimization[J]. International Journal of Robotics Research, 2015, 34(3):314-334.  Kummerle R, Grisetti G, Strasdat H, et al. g2o:A general framework for graph optimization[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2011:3607-3613.  Burri M, Nikolic J, Gohl P, et al. The EuRoC micro aerial vehicle datasets[J]. International Journal of Robotics Research, 2016, 35(10):1157-1163.  Qin T, Shen S J. Robust initialization of monocular visual-inertial estimation on aerial robots[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2017:4225-4232.  Martinelli A. Closed-form solution of visual-inertial structure from motion[J]. International Journal of Computer Vision, 2014, 106(2):138-152.  Kaiser J, Martinelli A, Fontana F, et al. Simultaneous state initialization and gyroscope bias calibration in visual inertial aided navigation[J]. IEEE Robotics and Automation Letters, 2017, 2(1):18-25.  von Stumberg L, Usenko V, Cremers D. Direct sparse visual-inertial odometry using dynamic marginalization[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2018:2510-2517.  Mur-Artal R, Tardos J D. ORB-SLAM2:An open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5):1255-1262.