Fast Planar Grasp Pose Detection for Robot Based on Cascaded Deep Convolutional Neural Networks
XIA Jing1,2, QIAN Kun1,2, MA Xudong1,2, LIU Huan1,2
1. School of Automation, Southeast University, Nanjing 210096, China;
2. Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Nanjing 210096, China
Abstract:A fast planar grasp pose detection method for robot based on cascaded convolutional neural networks is proposed to detect the pose for the unknown irregular objects with arbitrary poses. A cascaded two-stage convolution neural network model based on from coarse-to fine-scale position-attitude is established. The transfer-learning mechanism is used to train the model on small scale data sets. The grasp position candidate bounding-boxes are extracted and the coarse angle is estimated based on the R-FCN (region-based fully convolutional network) model. Angle-Net is proposed to solve the low accuracy detection problem of the previous methods, which can estimate the grasp angles with higher accuracy. Validations on the Cornell dataset and online grasp experiments on the real robot indicate that the proposed method can fast calculate the optimal grasp point and attitude for irregular objects with any shape and pose, the detection accuracy and speed are improved compared with the previous methods, the robustness and stability are strong, and it can be generalized to adapt to new object untrained.
[1] Dogar M, Hsiao K, Ciocarlie M, et al. Physics-based grasp planning through clutter[C]//Robotics:Science and Systems VⅢ. Cambridge, USA:MIT Press, 2012:8pp.
[2] Goldfeder C, Ciocarlie M, Dang H, et al. The Columbia grasp database[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2009:1710-1716.
[3] Weisz J, Allen P K. Pose error robust grasping from contact wrench space metrics[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2012:557-562.
[4] Jiang Y, Moseson S, Saxena A. Efficient grasping from RGBD images:Learning using a new rectangle representation[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE,2011:3304-3311.
[5] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[6] 仲训杲,徐敏,仲训昱,等.基于多模特征深度学习的机器人抓取判别方法[J].自动化学报,2016,42(7):1022-1029. Zhong X G, Xu M, Zhong X Y, et al. Multimodal features deep learning for robotic potential grasp recognition[J]. Acta Automatica Sinica, 2016, 42(7):1022-1029.
[7] Lenz I, Lee H, Saxena A. Deep learning for detecting robotic grasps[J]. International Journal of Robotics Research, 2015, 34(4/5):705-724.
[8] 杜学丹,蔡莹皓,鲁涛,等.一种基于深度学习的机械臂抓取方法[J].机器人,2017,39(6):820-828,837. Du X D, Cai Y H, Lu T, et al. A robotic grasping method based on deep learning[J]. Robot, 2017, 39(6):820-828,837.
[9] Pinto L, Gupta A. Supersizing self-supervision:Learning to grasp from 50k tries and 700 robot hours[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2016:3406-3413.
[10] Guo D, Sun F C, Liu H P, et al. A hybrid deep architecture for robotic grasp detection[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:1609-1614.
[11] Mahler J, Liang J, Niyaz S, et al. Dex-Net 2.0:Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics[A/OL]. (2017-08-08)[2017-12-07]. https://arxiv.org/abs/1703.09312v3.
[12] Dai J F, Li Y, He K M, et al. R-FCN:Object detection via region-based fully convolutional networks[M]//Advances in Neural Information Processing Systems 29. Cambridge, USA:MIT Press, 2016:379-387.
[13] Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359.
[14] Huang J, Rathod V, Sun C, et al. Speed/accuracy trade-offs for modern convolutional object detectors[A/OL]. (2017-04-25)[2017-12-07]. https://arxiv.org/abs/1611.10012v3.
[15] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2014:580-587.
[16] Girshick R. Fast R-CNN[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2015:1440-1448.
[17] Ren S Q, He K M, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[M]//Advances in Neural Information Processing Systems 28. Cambridge, USA:MIT Press, 2015:91-99.
[18] Liu W, Anguelov D, Erhan D, et al. SSD:Single shot multibox detector[C]//European Conference on Computer Vision. Cham, Switzerland:Springer, 2016:21-37.
[19] Redmon J, Divvala S, Girshick R, et al. You only look once:Unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2016:779-788.
[20] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning[A/OL]. (2016-08-23)[2017-12-07]. https://arxiv.org/abs/1602.07261v2.
[21] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO:Common objects in context[C]//European Conference on Computer Vision. Cham, Switzerland:Springer, 2014:740-755.
[22] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[23] Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning. Cambridge, USA:MIT Press, 2013:2176-2184.
[24] Redmon J, Angelova A. Real-time grasp detection using convolutional neural networks[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2015:1316-1322.
[25] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[A/OL]. (2015-04-10)[2017-12-07]. https://arxiv.org/abs/1409.1556v6.