XU Shengjun, REN Junlin, LIU Guanghui, MENG Yuebo, HAN Jiuqiang. Lightweight Encoding-Decoding Grasp Pose Detection Based on a Context Aggregation Strategy[J]. ROBOT, 2023, 45(6): 641-654. DOI: 10.13973/j.cnki.robot.220445
Citation: XU Shengjun, REN Junlin, LIU Guanghui, MENG Yuebo, HAN Jiuqiang. Lightweight Encoding-Decoding Grasp Pose Detection Based on a Context Aggregation Strategy[J]. ROBOT, 2023, 45(6): 641-654. DOI: 10.13973/j.cnki.robot.220445

Lightweight Encoding-Decoding Grasp Pose Detection Based on a Context Aggregation Strategy

  • It is difficult to estimate the grasp pose of diverse targets in an unstructured environment. For this problem, a lightweight encoding/decoding grasp pose detection network based on context aggregation strategy is proposed. Firstly, the deep separation-fusion extraction block of target features is constructed based on the encoding/decoding network architecture by using the depth separable convolution and the shuffle unit to reduce the number of the encoding network parameters and enhance the network's ability to extract features of the grasp region. Then, the bilinear interpolation and the depth separable convolution are used to establish the deep separation-reconstruction block, which can effectively reduce the parameters of the decoding network while restoring the lost information of high-level features. Finally, in view of the inconsistency between the pixels in the graspable area and the whole picture of the target object, a grasp region context aggregation strategy is proposed based on cross entropy auxiliary loss and self-attention mechanism to guide the network to enhance the representation ability of the features of the graspable target area and suppress the redundant features of non-graspable pixels. The experimental results show that the grasp and detection accuracies of the proposed network on the image-wise and object-wise subsets of Cornell dataset can reach 97.8% and 93.8% respectively, and the detection speed of a single image can reach 64.93 frame/s; on Jacquard dataset, the detection accuracy can reach 95.1%, and the detection speed of a single image can reach 60.6 frame/s.Compared with the comparative networks, the proposed network not only has a small amount of calculation and parameters, but also has a significant improvement in the accuracy and speed of grasp detection. In the verification of grasp detection of 9 objects in the real scene, the success rate of grasp reaches 93.3%.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return