MO Xiuyun, CHEN Junhong, YANG Zhenguo, LIU Wenyin. A Robotic Command Generation Framework Based on Human Demonstration Videos[J]. ROBOT, 2022, 44(2): 186-194, 202. DOI: 10.13973/j.cnki.robot.200539
Citation: MO Xiuyun, CHEN Junhong, YANG Zhenguo, LIU Wenyin. A Robotic Command Generation Framework Based on Human Demonstration Videos[J]. ROBOT, 2022, 44(2): 186-194, 202. DOI: 10.13973/j.cnki.robot.200539

A Robotic Command Generation Framework Based on Human Demonstration Videos

  • In order to improve the robot's ability of skills learning and avoid the manual teaching process, a sequence-to-sequence framework is proposed to automatically generate robotic commands based on the observation of human demonstration videos without any special marks.Firstly, a Mask R-CNN (region-based convolutional neural network) is used to reduce the manipulation area, and a two-stream I3D network (inflated 3D convolutional network) is adopted to extract the optical features as well as the RGB features from the videos.Secondly, a bidirectional LSTM (long short-term memory) network is introduced for acquiring the context information from the previous features extracted.Finally, self-attention and global attention mechanisms are integrated to learn the correlation between a sequence of video frames and a sequence of commands, and the sequence-to-sequence model ultimately outputs the robotic commands.Experiments are extensively conducted on the expanded MPII Cooking 2 dataset and the IIT-V2C dataset.Compared with the existing methods, the proposed method has a current state-of-the-art performance on indicators such as BLEU_4(0.705) and METEOR (0.462).The results show that the proposed method can learn manipulation tasks from human demonstration videos.In particular, the framework is successfully applied to a Baxter robot.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return