Abstract:
Frame-based cameras are generally used in traditional visual place recognition (VPR) methods, which often causes failure of VPR in the cases of dramatic illumination changes or fast motion. To overcome this, an end-to-end VPR network using event cameras is proposed, which can achieve good VPR performance in challenging environments. The key idea of the proposed algorithm is to firstly characterize the event streams with the event spike tensor (EST) voxel grid, then extract features using a deep residual network, and finally aggregate features using an improved VLAD (vector of locally aggregated descriptor) network to realize end-to-end VPR using event streams. Comparison experiments among the proposed method and classical VPR methods are carried out on the event-based driving datasets (MVSEC, DDD17) and the synthetic event stream datasets (Oxford RobotCar). As results, the performance of the proposed method is better than that of frame-based VPR methods in challenging scenarios (such as night scenes), with an approximately 6.61% improvement in Recall@1 index. To our knowledge, for visual place recognition task, this is the first end-to-end weakly supervised deep network architecture that directly processes event stream data.