一种基于透视投影的单目3D目标检测网络

张峻宁; 苏群星; 刘鹏远; 谷宏强; 王威

doi:10.13973/j.cnki.robot.190221

一种基于透视投影的单目3D目标检测网络

A Monocular 3D Target Detection Network with Perspective Projection

摘要

摘要: 针对单目3D目标检测网络训练约束少、模型预测精度低的问题，通过网络结构改进、透视投影约束建立以及损失函数优化等步骤，提出了一种基于透视投影的单目3D目标检测网络．首先，在透视投影机理的基础上，利用世界、相机以及目标三者之间的变换关系，建立一种利用消失点（VP）求解3D目标边界框的模型；其次，运用空间几何关系和先验尺寸信息，将其简化为方位角、目标尺寸与3D边界框的约束关系；最后，根据尺寸约束的单峰、易回归优势，进一步提出一种学习型的方位角—尺寸的损失函数，提高了网络的学习效率和预测精度．在模型训练中，针对单目3D目标检测网络未约束3D中心的缺陷，基于3D边界框和2D边框的空间几何关系，提出联合约束方位角、尺寸、3D中心的训练策略．在KITTI和SUN-RGBD数据集上进行实验验证，结果显示本文算法能获得更准确的目标检测结果，表明在3D目标检测方面该方法比其他算法更有效．

Abstract: For the number of training constraints is small and the prediction accuracy of the model is low in the monocular 3D target detection network, a monocular 3D target detection network with perspective projection is proposed through the improvement of network structure, the establishment of perspective projection constraints and the optimization of loss function and so on. Firstly, a 3D target bounding box model based on vanishing point (VP) is established by using the transformation relationship among the world, the camera and the target based on the perspective projection mechanism. Secondly, it is simplified into the constraint relationship among the yaw angle, target size and 3D bounding box by combining the spatial geometric relationship and the prior size information. Finally, a learning-type azimuth-size loss function based on the constraint relationship is proposed by taking full advantages of single peak and easy regression of the size constraint, and thus the learning efficiency and prediction accuracy of the network are enhanced. In view of the lack of 3D center constraints in the monocular 3D target detection network, a training strategy jointly constraining the azimuth, size, and 3D center in model training is proposed based on the spatial geometry of the 3D bounding box and 2D bounding box. Experiments on KITTI and SUN-RGBD datasets show that the proposed model can achieve better results and is more effective than the other algorithms in 3D target detection.

HTML全文

参考文献(36)

施引文献

资源附件(0)