Visual Segmentation of Unknown Objects Based on Inference of Object Logic States
BAO Jiatong1,2, SONG Aiguo2, HONG Ze1, SHEN Tianhe1, TANG Hongru1
1. School of Hydraulic Energy and Power Engineering, Yangzhou University, Yangzhou 225300, China;
2. School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China
包加桐, 宋爱国, 洪泽, 沈天鹤, 唐鸿儒. 基于物体逻辑状态推理的未知物体视觉分割[J]. 机器人, 2017, 39(4): 431-438.DOI: 10.13973/j.cnki.robot.2017.0431.
BAO Jiatong, SONG Aiguo, HONG Ze, SHEN Tianhe, TANG Hongru. Visual Segmentation of Unknown Objects Based on Inference of Object Logic States. ROBOT, 2017, 39(4): 431-438. DOI: 10.13973/j.cnki.robot.2017.0431.
Abstract:To improve the visual perceptual capabilities of robots, a novel method is proposed to visually segment unknown objects based on inference of object logic states. In the semantic level, the space of object logic states is defined. The object logic states are deduced according to the feedback of robot grasping actions for object. In the data level, an RGB-D (RGB-depth) camera is employed to capture the 3D colored point cloud of the situated environment. Based on the assumption that objects are always supported by a planar surface, all possible object points are spatially clustered and segmented, resulting in the initial unknown-object set where the logic state of each object is also initialized. When the logic states of objects change, several predefined rules are activated and some point sets are calculated in order to re-segment the point clouds of the changed objects. The changed object set can also be used to update the space of object logic states. The proposed method is tested on our 7-DOF (degree of freedom) mobile manipulator which is commanded to segment and grasp unknown objects in the simulated environment consisting of real blocks. Experimental results demonstrate that the proposed method can improve the visual perceptual capabilities of robots effectively.
[1] Tellex S, Kollar T, Dickerson S R, et al. Understanding natural language commands for robotic navigation and mobile manipulation[C]//Proceedings of the National Conference on Artificial Intelligence. El Segundo, USA:AI Access Foundation, 2011:1507-1514.
[2] Chen D L, Mooney R J. Learning to interpret natural language navigation instructions from observations[C]//Proceedings of the National Conference on Artificial Intelligence. El Segundo, USA:AI Access Foundation, 2011:859-865.
[3] Hemachandra S, Duvallet F, Howard T M, et al. Learning models for following natural language directions in unknown environments[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2015:5608-5615.
[4] Mohan S, Mininger A, Laird J. Towards an indexical model of situated language comprehension for cognitive agents in physical worlds[J]. Advances in Cognitive Systems, 2014, 3:163-182.
[5] Collet A, Martinez M, Srinivasa S S. The MOPED framework:Object recognition and pose estimation for manipulation[J]. International Journal of Robotics Research, 2011, 30(10):1284-1306.
[6] Schwarz M, Schulz H, Behnke S. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features[C]//IEEE International Conference on Robotics and Automation, Piscataway, USA:IEEE, 2015:1329-1335.
[7] Sun Y, Bo L, Fox D. Attribute based object identification[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2013:2096-2103.
[8] Guadarrama S, Riano L, Golland D, et al. Grounding spatial relations for human-robot interaction[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2013:1640-1647.
[9] Mishra A K, Aloimonos Y, Cheong L F, et al. Active visual segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4):639-653.
[10] Potapova E, Varadarajan K M, Richtsfeld A, et al. Attention-driven object detection and segmentation of cluttered table scenes using 2.5D symmetry[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2014:4946-4952.
[11] Bao J T, Jia Y Y, Cheng Y, et al. Saliency-guided detection of unknown objects in RGB-D indoor scenes[J]. Sensors, 2015, 15(9):21054-21074.
[12] Johnson-Roberson M, Bohg J, Skantze G, et al. Enhanced visual scene understanding through human-robot dialog[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2011:3342-3348.
[13] Sun Y, Bo L, Fox D. Learning to identify new objects[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2014:3165-3172.
[14] Bohg J, Barck-Holst C, Hubner K, et al. Towards grasp-oriented visual perception for humanoid robots[J]. International Journal of Humanoid Robots, 2009, 6(3):387-434.
[15] Bao J T, Jia Y Y, Cheng Y, et al. Detecting target objects by natural language instructions using an RGB-D camera[J]. Sensors, 2016, 16(12):No.2117.
[16] Cheng Y, Su C Z, Jia Y Y, et al. Data correlation approach for slippage detection in robotic manipulations using tactile sensor array[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2015:2717-2722.
[17] Zhang H T, Jia Y Y, Guo Y, et al. Online sensor information and redundancy resolution based obstacle avoidance for high DOF mobile manipulator teleoperation[J]. International Journal of Advanced Robotic Systems, 2013, 10:No.244.
[18] Burrus N. Core library used by RGBDemo to process Kinect data[EB/OL]. (2013-06-12)[2017-03-10]. https://github.com/rgbdemo/nestk.