A Robotic Command Generation Framework Based on Human Demonstration Videos

MO Xiuyun; CHEN Junhong; YANG Zhenguo; LIU Wenyin

doi:10.13973/j.cnki.robot.200539

MO Xiuyun, CHEN Junhong, YANG Zhenguo, LIU Wenyin. A Robotic Command Generation Framework Based on Human Demonstration Videos[J]. ROBOT, 2022, 44(2): 186-194, 202. DOI: 10.13973/j.cnki.robot.200539

Citation:

MO Xiuyun, CHEN Junhong, YANG Zhenguo, LIU Wenyin. A Robotic Command Generation Framework Based on Human Demonstration Videos[J]. ROBOT, 2022, 44(2): 186-194, 202. DOI: 10.13973/j.cnki.robot.200539

Citation:

MO Xiuyun, CHEN Junhong, YANG Zhenguo, LIU Wenyin. A Robotic Command Generation Framework Based on Human Demonstration Videos[J]. ROBOT, 2022, 44(2): 186-194, 202. DOI: 10.13973/j.cnki.robot.200539

A Robotic Command Generation Framework Based on Human Demonstration Videos

Graphical Abstract

Graphical Abstract

Abstract

Abstract

In order to improve the robot's ability of skills learning and avoid the manual teaching process, a sequence-to-sequence framework is proposed to automatically generate robotic commands based on the observation of human demonstration videos without any special marks.Firstly, a Mask R-CNN (region-based convolutional neural network) is used to reduce the manipulation area, and a two-stream I3D network (inflated 3D convolutional network) is adopted to extract the optical features as well as the RGB features from the videos.Secondly, a bidirectional LSTM (long short-term memory) network is introduced for acquiring the context information from the previous features extracted.Finally, self-attention and global attention mechanisms are integrated to learn the correlation between a sequence of video frames and a sequence of commands, and the sequence-to-sequence model ultimately outputs the robotic commands.Experiments are extensively conducted on the expanded MPII Cooking 2 dataset and the IIT-V2C dataset.Compared with the existing methods, the proposed method has a current state-of-the-art performance on indicators such as BLEU_4(0.705) and METEOR (0.462).The results show that the proposed method can learn manipulation tasks from human demonstration videos.In particular, the framework is successfully applied to a Baxter robot.

FullText(HTML)

References (43)

Cited By

A Robotic Command Generation Framework Based on Human Demonstration Videos

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content