
High-precision Real-time Object Detection Based on Bird's Eye Viewfrom 3D Point Clouds

  • 摘要: 针对基于3维点云的目标检测问题,提出了一种高精度实时的单阶段深度神经网络,分别在网络特征提取、损失函数设计和训练数据增强等3个方面提出了新的解决方案.首先对点云直接进行体素化来构建鸟瞰图.在特征提取阶段,使用残差结构提取高层语义特征,并融合多层次特征输出稠密的特征图.在回归鸟瞰图上的目标框的同时,在损失函数中考虑二次偏移量以实现更高精度的收敛.在网络训练中,使用不同帧3维点云混合的方式进行数据增强,提高网络的泛化性能.基于KITTI鸟瞰图目标检测数据集的实验结果表明,本文提出的网络仅使用雷达点云的位置信息,在性能上不仅优于目前最先进的鸟瞰图目标检测网络,而且优于融合图像和点云的检测方案,且整个网络运行速度达到20帧/秒,满足实时性要求.


    Abstract: For the problem of object detection from 3D point clouds, a high-precision and real-time single-stage deep neural network is proposed, which includes new solutions in three aspects: network feature extraction, loss function design and data augmentation. Firstly, the point clouds are directly voxelized to build a bird's eye view (BEV). In the step of feature extraction, the residual structure is used to extract high-level semantic features, and the multi-level features are combined to output dense feature map. While regressing the bounding boxes of objects from the BEV, the quadratic offset is considered in the loss function to achieve the convergence with higher precision. In training process, data augmentation is adopted by mixing 3D point clouds from different frames to improve the generalization of the network. The experimental results based on the KITTI BEV object detection dataset show that the proposed network only using the position information of the lidar point cloud, is not only better than the state-of-the-art BEV object detection network in performance, but also outperforms the methods that fuse images and point clouds. And the speed of the entire network reaches 20 frame/s, which meets the real-time requirement.


