王德明, 颜熠, 周光亮, 李勇奇, 刘成菊, 林立民, 陈启军. 基于实例分割网络与迭代优化方法的3D视觉分拣系统[J]. 机器人, 2019, 41(5): 637-648.DOI: 10.13973/j.cnki.robot.180806.
WANG Deming, YAN Yi, ZHOU Guangliang, LI Yongqi, LIU Chengju, LIN Limin, CHEN Qijun. 3D Vision-Based Picking System with Instance Segmentation Network and Iterative Optimization Method. ROBOT, 2019, 41(5): 637-648. DOI: 10.13973/j.cnki.robot.180806.
Abstract:A workpiece recognition and picking system based on instance segmentation network and iterative optimization method is proposed for object detection and pose estimation of scattered and stacked texture-less industrial objects. This system consists of three modules, including image acquisition module, target detection module and pose estimation module. In image acquisition module, a dual RGB-D (RGB-depth) camera structure is designed to get higher quality depth data by merging three depth images. The target detection module modifies the instance segmentation network Mask R-CNN (region-based convolutional neural network). The modified network takes RGB images and HHA (horizontal disparity, height above ground, angle with gravity) features containing three-dimensional information as input, and adds STN (spatial transformer network) modules inside to improve the segmentation performance of texture-less objects. Then the module can combine point cloud information to obtain the target point cloud. On this basis, the improved 4PCS (4-points congruent set) algorithm and ICP (iterative closest point) algorithm are used in pose estimation module to match the segmented point cloud with the target model and fine-tune the pose, and thus the final result of pose estimation is obtained. The robots accomplish picking action according to the estimated pose. The experiment results on our workpiece dataset and the actual picking system indicate that the proposed method can achieve fast target recognition and pose estimation for scattered and stacked objects with different shapes and less textures. Its performance can meet the requirements of practical applications with 1 mm position error and 1° angle error.
[1] Hinterstoisser S, Lepetit V, Ilic S, et al. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes[C]//Asian Conference on Computer Vision. Berlin, Germany:Springer, 2012:548-562.
[2] Duff D J, Mörwald T, Stolkin R, et al. Physical simulation for monocular 3D model based tracking[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2011:5218-5225.
[3] Guo Y, Sohel F, Bennamoun M, et al. Rotational projection statistics for 3D local surface description and object recognition[J]. International Journal of Computer Vision, 2013, 105(1):63-86.
[4] Drost B, Ulrich M, Navab N, et al. Model globally, match locally:Efficient and robust 3D object recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2010:998-1005.
[5] 伍锡如,黄国明,孙立宁.基于深度学习的工业分拣机器人快速视觉识别与定位算法[J].机器人,2016,38(6):711-719.Wu X R, Huang G M, Sun L N. Fast visual identification and location algorithm for industrial sorting robots based on deep learning[J]. Robot, 2016, 38(6):711-719.
[6] 杜学丹,蔡莹皓,鲁涛,等.一种基于深度学习的机械臂抓取方法[J].机器人,2017,39(6):820-828,837.Du X D, Cai Y H, Lu T, et al. A robotic grasping method based on deep learning[J]. Robot, 2017, 39(6):820-828,837.
[7] 夏晶,钱堃,马旭东,等.基于级联卷积神经网络的机器人平面抓取位姿快速检测[J].机器人,2018,40(6):794-802.Xia J, Qian K, Ma X D, et al. Fast planar grasp pose detection for robot based on cascaded deep convolutional neural networks[J]. Robot, 2018, 40(6):794-802.
[8] Zeng A, Song S, Yu K T, et al. Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2018:3750-3757.
[9] Gupta S, Girshick R, Arbeláez P, et al. Learning rich features from RGB-D images for object detection and segmentation[C]//13th European Conference on Computer Vision. Cham, Switzerland:Springer, 2014:345-360.
[10] Alexandre L A. 3D object recognition using convolutional neural networks with transfer learning between input channels[C]//13th International Conference on Intelligent Autonomous Systems. Cham, Switzerland:Springer, 2016:888-897.
[11] Jonschkowski R, Eppner C, Höfer S, et al. Probabilistic multi-class segmentation for the amazon picking challenge[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2016:1-7.
[12] Su H, Qi C R, Li Y, et al. Render for CNN:Viewpoint estimation in images using CNNs trained with rendered 3D model views[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2015:2686-2694.
[13] Xiang Y, Schmidt T, Narayanan V, et al. PoseCNN:A convolutional neural network for 6D object pose estimation in cluttered scenes[EB/OL]. (2017-11-01)[2018-12-30]. https://arxiv.org/abs/1711.00199.
[14] Kehl W, Manhardt F, Tombari F, et al. SSD-6D:Making RGB-based 3D detection and 6D pose estimation great again[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2017:1530-1538.
[15] Rad M, Lepetit V. BB8:A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2017:3848-3856.
[16] Liu H S, Cong Y, Wang S, et al. Deep learning of directional truncated signed distance function for robust 3D object recognition[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2017:5934-5940.
[17] Cong Y, Tian D Y, Feng Y, et al. Speedup 3-D texture-less object recognition against self-occlusion for intelligent manufacturing[J]. IEEE Transactions on Cybernetics, 2018. DOI:10.1109/TCYB.2018.2851666.
[18] Kehl W, Milletari F, Tombari F, et al. Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation[C]//14th European Conference on Computer Vision. Cham, Switzerland:Springer, 2016:205-220.
[19] Tejani A, Tang D, Kouskouridas R, et al. Latent-class Hough forests for 3D object detection and pose estimation[C]//13th European Conference on Computer Vision. Cham, Switzerland:Springer, 2014:462-477.
[20] Doumanoglou A, Kouskouridas R, Malassiotis S, et al. Recovering 6D object pose and predicting next-best-view in the crowd[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2016:3583-3592.
[21] Geiger A, Roser M, Urtasun R. Efficient large-scale stereomatching[C]//10th Asian Conference on Computer Vision. Berlin, Germany:Springer, 2010:25-38.
[22] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2017:2980-2988.
[23] Aiger D, Mitra N J, Cohen-Or D. 4-points congruent sets for robust pairwise surface registration[J]. ACM Transactions on Graphics, 2008, 27(3). DOI:10.1145/1360612.1360684.
[24] Besl P J, McKay N D. A method for registration of 3-D shapes[C]//Sensor Fusion IV:Control Paradigms and Data Structures. Proceedings of the SPIE, Vol.1611. Bellingham, USA:SPIE, 1992:586-606.
[25] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Piscataway, USA:IEEE, 2017:5987-5995.
[26] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Piscataway, USA:IEEE, 2017:936-944.
[27] Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[C]//29th Annual Conference on Neural Information Processing Systems. La Jolla, USA:Neural Information Processing Systems Foundation, 2015:2017-2025.
[28] Gupta S, Arbelaez P, Malik J. Perceptual organization and recognition of indoor scenes from RGB-D images[C]//26th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2013:564-571.
[29] Goodrich M T, Mitchell J S B, Orletsky M W. Practical methodsfor approximate geometric pattern matching under rigid motions[C]//10th Annual Symposium on Computational Geometry. New York, USA:ACM, 1994:103-112.
[30] Hinterstoisser S, Cagniart C, Ilic S, et al. Gradient response maps for real-time detection of textureless objects[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(5):876-888.