Abstract:Referring to the characteristics of object grasp of human, a three-level serial convolution neural network (CNN) for object grasp detecting is proposed to realize high-accuracy grasp of unknown objects. In the proposed three-level serial CNN, the first level network can locate the object position roughly to determine the position for the searching of the grasping rectangles in the next level CNN. The second level network is used to obtain the preselected grasping rectangles and get very less features with a quite small network, so as to quickly find out the available object grasping rectangles and to eliminate unavailable object grasping rectangles. The third level network is applied to reevaluating the preselected object grasping rectangles and get more features with a bigger network to exactly evaluate every preselected object grasping rectangle and obtain the best preselected object grasping rectangle. The experimental results validate that the grasping accuracy of the three-level serial CNN increases by 6.1% compared with the single CNN. Finally, the three-level CNN realizes high-accuracy grasping with an actual Youbot.
[1] Tanner E, Granade S, Whitehead C A. Autonomous rendezvous and docking sensor suite[C]//Proceedings of the SPIE, vol.5086. Bellingham, USA:SPIE, 2003:329-339.
[2] Maitin-Shepard J, Cusumano-Towner M, Lei J, et al. Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2010:2308-2315.
[3] Bicchi A, Kumar V. Robotic grasping and contact:A review[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2000:348-353.
[4] Miller A T, Allen P K. Grasp it! A versatile simulator for robotic grasping[J]. IEEE Robotics & Automation Magazine, 2004, 11(4):110-122.
[5] León B, Ulbrich S, Diankov R, et al. OpenGRASP:A toolkit for robot grasping simulation[C]//2nd International Conference on Simulation, Modeling, and Programming for Autonomous Robots. Berlin, Germany:Springer, 2010:109-120.
[6] Weisz J, Allen P K. Pose error robust grasping from contactwrench space metrics[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2012:557-562.
[7] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25. Canada:Neural Information Processing System Foundation, 2012:1097-1105.
[8] Zhang Y T, Sohn K, Villegas R, et al. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2015:249-258.
[9] Nguyen A, Kanoulas D, Caldwell D G, et al. Detecting object affordances with convolutional neural networks[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2016:2765-2770.
[10] Schwarz M, Schulz H, Behnke S. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2015:1329-1335.
[11] Johns E, Leutenegger S, Davison A J. Deep learning a grasp function for grasping under gripper pose uncertainty[C]//IEEE/RSJ International Conference on Intelligent Robots and Sys-tems. Piscataway, USA:IEEE, 2016:4461-4468.
[12] Levine S, Pastor P, Krizhevsky A, et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection[J]. International Journal of Robotics Research, doi:10.1177/0278364917710318.
[13] Jiang Y, Moseson S, Saxena A. Efficient grasping from RGBD images:Learning using a new rectangle representation[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2011:3304-3311.
[14] Cornell University. Cornell grasping dataset[DB/OL].[2016-09-01]. http://pr.cs.cornell.edu/grasping/rect_data/data.php.
[15] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
[16] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[17] Lenz I, Lee H, Saxena A. Deep learning for detecting robotic grasps[J]. International Journal of Robotics Research, 2015, 34(4-5):705-724.
[18] Redmon J, Angelova A. Real-time grasp detection using convolutional neural networks[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2015:1316-1322.