CHEN Zonghai, HONG Yang, WANG Jikai, GE Zhenhua. Monocular Visual Odometry Based on Recurrent Convolutional Neural Networks[J]. ROBOT, 2019, 41(2): 147-155. DOI: 10.13973/j.cnki.robot.180314
Citation: CHEN Zonghai, HONG Yang, WANG Jikai, GE Zhenhua. Monocular Visual Odometry Based on Recurrent Convolutional Neural Networks[J]. ROBOT, 2019, 41(2): 147-155. DOI: 10.13973/j.cnki.robot.180314

Monocular Visual Odometry Based on Recurrent Convolutional Neural Networks

  • A monocular visual odometry method based on convolutional long short term memory (LSTM) network and convolutional neural network (CNN) is proposed, named LSTM visual odometry (LSTMVO). LSTMVO uses an unsupervised end-to-end deep learning framework to simultaneously estimate the 6-DoF (degree of freedom) pose and scene depth of monocular cameras. The entire network framework includes a pose estimation network and a depth estimation network. The pose estimation network is a deep recurrent convolutional neural network (RCNN) that implements monocular pose estimation from end to end, consisting of feature extraction based on convolutional neural networks and time-series modeling based on recurrent neural networks (RNN). The depth estimation network generates dense depth maps primarily based on the encoder-decoder architecture. At the same time, a new loss function for network training is proposed. The loss function consists of time series loss, loss of depth smoothness, and loss of consistency before and after the image sequence. The experimental results based on KITTI dataset show that by training on the original monocular RGB image, LSTMVO is superior to the existing mainstream monocular visual odometry methods in terms of pose estimation accuracy and depth estimation accuracy, verifying the effectiveness of the deep learning framework proposed.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return