孔德磊, 方正, 李昊佳, 侯宽旭, 姜俊杰. 基于事件的端到端视觉位置识别弱监督网络架构[J]. 机器人, 2022, 44(5): 613-625. DOI: 10.13973/j.cnki.robot.210303
引用本文: 孔德磊, 方正, 李昊佳, 侯宽旭, 姜俊杰. 基于事件的端到端视觉位置识别弱监督网络架构[J]. 机器人, 2022, 44(5): 613-625. DOI: 10.13973/j.cnki.robot.210303
KONG Delei, FANG Zheng, LI Haojia, HOU Kuanxu, JIANG Junjie. An End-to-End Weakly Supervised Network Architecture for Event-based Visual Place~Recognition[J]. ROBOT, 2022, 44(5): 613-625. DOI: 10.13973/j.cnki.robot.210303
Citation: KONG Delei, FANG Zheng, LI Haojia, HOU Kuanxu, JIANG Junjie. An End-to-End Weakly Supervised Network Architecture for Event-based Visual Place~Recognition[J]. ROBOT, 2022, 44(5): 613-625. DOI: 10.13973/j.cnki.robot.210303

基于事件的端到端视觉位置识别弱监督网络架构

An End-to-End Weakly Supervised Network Architecture for Event-based Visual Place~Recognition

  • 摘要: 传统的视觉位置识别(VPR)方法通常使用基于图像帧的相机,存在剧烈光照变化、快速运动等易导致VPR失败的问题。针对上述问题,本文提出了一种使用事件相机的端到端VPR网络,可以在具有挑战性的环境中实现良好的VPR性能。所提出算法的核心思想是,首先采用事件脉冲张量(EST)体素网格对事件流进行表征,然后利用深度残差网络进行特征提取,最后采用改进的局部聚合描述子向量(VLAD)网络进行特征聚合,最终实现基于事件流的端到端VPR。将该方法在基于事件的驾驶数据集(MVSEC、DDD17)和人工合成的事件流数据集(Oxford RobotCar)上与典型的基于图像帧的视觉位置识别方法进行了比较实验。结果表明,在具有挑战性的场景(例如夜晚场景)中,本文方法的性能优于基于图像帧的视觉位置识别方法,其Recall@1指标提升约6.61%。据我们所知,针对视觉位置识别任务,这是首个直接处理事件流数据的端到端弱监督深度网络架构。

     

    Abstract: Frame-based cameras are generally used in traditional visual place recognition (VPR) methods, which often causes failure of VPR in the cases of dramatic illumination changes or fast motion. To overcome this, an end-to-end VPR network using event cameras is proposed, which can achieve good VPR performance in challenging environments. The key idea of the proposed algorithm is to firstly characterize the event streams with the event spike tensor (EST) voxel grid, then extract features using a deep residual network, and finally aggregate features using an improved VLAD (vector of locally aggregated descriptor) network to realize end-to-end VPR using event streams. Comparison experiments among the proposed method and classical VPR methods are carried out on the event-based driving datasets (MVSEC, DDD17) and the synthetic event stream datasets (Oxford RobotCar). As results, the performance of the proposed method is better than that of frame-based VPR methods in challenging scenarios (such as night scenes), with an approximately 6.61% improvement in Recall@1 index. To our knowledge, for visual place recognition task, this is the first end-to-end weakly supervised deep network architecture that directly processes event stream data.

     

/

返回文章
返回