A Survey of Robot Manipulation Behavior Research Based on Deep Reinforcement Learning
CHEN Jiapan1, ZHENG Minhua1,2
1. School of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, China; 2. Key Laboratory of Vehicle Advanced Manufacturing, Measuring and Control Technology (Beijing Jiaotong University), Ministry of Education, Beijing 100044, China
陈佳盼, 郑敏华. 基于深度强化学习的机器人操作行为研究综述[J]. 机器人, 2022, 44(2): 236-256.DOI: 10.13973/j.cnki.robot.210008.
CHEN Jiapan, ZHENG Minhua. A Survey of Robot Manipulation Behavior Research Based on Deep Reinforcement Learning. ROBOT, 2022, 44(2): 236-256. DOI: 10.13973/j.cnki.robot.210008.
Abstract:By summarizing previous studies,the basic theories and algorithms of deep learning and reinforcement learning are introduced firstly.Secondly,the popular DRL (deep reinforcement learning) algorithms and their applications to robot manipulation are summarized.Finally,the future development directions of applying DRL to robot manipulation are forecasted according to the current problems and possible solutions.
[1] 刘乃军,鲁涛,蔡莹皓,等.机器人操作技能学习方法综述[J].自动化学报, 2019, 45(3):458-470. Liu N J, Lu T, Cai Y H, et al. A review of robot manipulation skills learning methods[J]. Acta Automatica Sinica, 2019, 45(3):458-470. [2] 倪自强,王田苗,刘达.基于视觉引导的工业机器人示教编程系统[J].北京航空航天大学学报, 2016, 42(3):562-568. Ni Z Q, Wang T M, Liu D. Vision guide based teaching programming for industrial robot[J]. Journal of Beijing University of Aeronautics and Astronautics, 2016, 42(3):562-568. [3] Rozo L, Jaquier N, Calinon S, et al. Learning manipulability ellipsoids for task compatibility in robot manipulation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2017:3183-3189. [4] Broquere X, Sidobre D, Nguyen K. From motion planning to trajectory control with bounded jerk for service manipulator robots[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2010:4505-4510. [5] Richter M, Sandamirskaya Y, Schöner G. A robotic architecture for action selection and behavioral organization inspired by human cognition[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2012:2457-2464. [6] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533. [7] Silver D, Hubert T, Schrittwieser J, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science, 2018, 362(6419):1140-1144. [8] Zhou L, Pan S, Wang J, et al. Machine learning on big data:Opportunities and challenges[J]. Neurocomputing, 2017, 237:350-361. [9] 秦方博,徐德.机器人操作技能模型综述[J].自动化学报, 2019, 45(8):1401-1418. Qin F B, Xu D. Review of robot manipulation skill models[J]. Acta Automatica Sinica, 2019, 45(8):1401-1418. [10] Haarnoja T, Pong V, Zhou A, et al. Composable deep reinforcement learning for robotic manipulation[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2018:6244-6251. [11] Peters J, Vijayakumar S, Schaal S. Reinforcement learning for humanoid robotics[C]//IEEE-RAS International Conference on Humanoid Robots. Piscataway, USA:IEEE, 2003:1-20. [12] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. La Jolla, USA:Neural Information Processing Systems Foundation, 2012:1097-1105. [13] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[DB/OL].(2014-04-10) [2021-04-02]. https://arxiv.org/abs/1409.1556. [14] Lin M, Chen Q, Yan S. Network in network[DB/OL].(2014-03-04)[2021-04-02]. https://arxiv.org/abs/1312.4400. [15] Nair A, Bahl S, Khazatsky A, et al. Contextual imagined goals for self-supervised robotic learning[C]//Conference on Robot Learning. Cambridge, USA:JMLR, 2020:530-539. [16] 杜学丹,蔡莹皓,鲁涛,等.一种基于深度学习的机械臂抓取方法[J].机器人, 2017, 39(6):820-828,837. Du X D, Cai Y H, Lu T, et al. A robotic grasping method based on deep learning[J]. Robot, 2017, 39(6):820-828,837. [17] 伍锡如,黄国明,孙立宁.基于深度学习的工业分拣机器人快速视觉识别与定位算法[J].机器人, 2016, 38(6):711-719. Wu X R, Huang G M, Sun L N. Fast visual identification and location algorithm for industrial sorting robots based on deep learning[J]. Robot, 2016, 38(6):711-719. [18] Agrawal P, Nair A V, Abbeel P, et al. Learning to poke by poking:Experiential learning of intuitive physics[C]//Advances in Neural Information Processing Systems. La Jolla, USA:Neural Information Processing Systems Foundation, 2016:5074-5082. [19] Krainin M, Curless B, Fox D. Autonomous generation of complete 3D object models using next best view manipulation planning[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2011:5031-5037. [20] Schenck C, Fox D. Visual closed-loop control for pouring liquids[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:2629-2636. [21] Sutton R S, Barto A G. Reinforcement learning:An introduction[M]. Cambridge, USA:MIT Press, 2018. [22] Kaelbling L P, Littman M L, Moore A W. Reinforcement learning:A survey[J]. Journal of Artificial Intelligence Research, 1996, 4:237-285. [23] Gu S, Lillicrap T, Sutskever I, et al. Continuous deep Qlearning with model-based acceleration[J]. Proceedings of Machine Learning Research, 2016, 48:2829-2838. [24] 万里鹏,兰旭光,张翰博,等.深度强化学习理论及其应用综述[J].模式识别与人工智能, 2019, 32(1):67-81. Wan L P, Lan X G, Zhang H B, et al. A review of deep reinforcement learning theory and application[J]. Pattern Recognition and Artificial Intelligence, 2019, 32(1):67-81. [25] 刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报, 2018, 41(1):1-27. Liu Q, Zhai J W, Zhang Z C, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1):1-27. [26] Bottou L, Chapelle O, DeCoste D, et al. Large-scale Kernel machines[M]. Cambridge, USA:MIT Press, 2007:321-359. [27] 赵冬斌,邵坤,朱圆恒,等.深度强化学习综述:兼论计算机围棋的发展[J].控制理论与应用, 2016, 33(6):701-717. Zhao D B, Shao K, Zhu Y H, et al. Review of deep reinforcement learning and discussions on the development of computer Go[J]. Control Theory&Applications, 2016, 33(6):701-717. [28] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489. [29] Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676):354-359. [30] Silver D, Hubert T, Schrittwieser J, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm[DB/OL].(2017-12-05)[2021-04-02]. https://arxiv.org/abs/1712.01815. [31] 陈兴国,俞扬.强化学习及其在电脑围棋中的应用[J].自动化学报, 2016, 42(5):685-695. Chen X G, Yu Y. Reinforcement learning and its application to the game of Go[J]. Acta Automatica Sinica, 2016, 42(5):685-695. [32] Dosovitskiy A, Koltun V. Learning to act by predicting the future[DB/OL].(2017-02-14)[2021-04-02]. https://arxiv.org/abs/1611.01779. [33] Ha D, Schmidhuber J. World models[DB/OL].(2017-05-09)[2021-04-02]. https://arxiv.org/abs/1803.10122. [34] Oh J, Guo X, Lee H, et al. Action-conditional video prediction using deep networks in Atari games[C]//Advances in Neural Information Processing Systems. La Jolla, USA:Neural Information Processing Systems Foundation, 2015:2863-2871. [35] Oh J, Chockalingam V, Singh S, et al. Control of memory, active perception, and action in minecraft[DB/OL].(2016-05-30)[2021-04-02]. https://arxiv.org/abs/1605.09128. [36] Lample G, Chaplot D S. Playing FPS games with deep reinforcement learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1):2140-2146. [37] Kempka M, Wydmuch M, Runc G, et al. ViZDoom:A Doombased AI research platform for visual reinforcement learning[C]//IEEE Conference on Computational Intelligence and Games. Piscataway, USA:IEEE, 2016:1-8. [38] Vinyals O, Ewalds T, Bartunov S, et al. StarCraft II:A new challenge for reinforcement learning[DB/OL].(2017-08-16)[2021-04-02]. https://arxiv.org/abs/1708.04782. [39] 孙长银,穆朝絮.多智能体深度强化学习的若干关键科学问题[J].自动化学报, 2020, 46(7):1301-1312. Sun C Y, Mu C X. Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7):1301-1312. [40] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[DB/OL].(2017-08-16)[2021-04-02]. https://arxiv.org/abs/1509.02971. [41] Heess N, TB D, Sriram S, et al. Emergence of locomotion behaviours in rich environments[DB/OL].(2019-07-05)[2021-04-02]. https://arxiv.org/abs/1707.02286. [42] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[DB/OL].(2017-08-28)[2021-04-02]. https://arxiv.org/abs/1707.06347. [43] Al-Shedivat M, Bansal T, Burda Y, et al. Continuous adaptation via meta-learning in nonstationary and competitive environments[DB/OL].(2018-02-23)[2021-04-02]. https://arxiv.org/abs/1710.03641. [44] Levine S, Abbeel P. Learning neural network policies with guided policy search under unknown dynamics[C]//Advances in Neural Information Processing Systems. La Jolla, USA:Neural Information Processing Systems Foundation, 2014:1071-1079. [45] Levine S, Finn C, Darrell T, et al. End-to-end training of deep visuomotor policies[J]. The Journal of Machine Learning Research, 2016, 17(1):1334-1373. [46] Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with deep reinforcement learning[DB/OL].(2013-12-19)[2021-04-02]. https://arxiv.org/abs/1312.5602. [47] van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2016, 30(1):2094-2100. [48] Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning[J]. Proceedings of Machine Learning Research, 2016, 48:1995-2003. [49] Bellemare M G, Dabney W, Munos R. A distributional perspective on reinforcement learning[J]. Proceedings of Machine Learning Research, 2017, 70:449-458. [50] Dabney W, Rowland M, Bellemare M G, et al. Distributional reinforcement learning with quantile regression[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1):2892-2901. [51] Bellman R E. Adaptive control processes:A guided tour[M]. New Jersey, USA:Princeton University Press, 2015. [52] Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning[J]. Proceedings of Machine Learning Research, 2016, 48:1928-1937. [53] TORCS. TORCS:The open racing car simulator[EB/OL].(2020-02-06)[2021-04-02]. http://torcs.sourceforge.net. [54] Todorov E, Erez T, Tassa Y. MuJoCo:A physics engine for model-based control[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2012:5026-5033. [55] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic:Offpolicy maximum entropy deep reinforcement learning with a stochastic actor[J].(2018-08-08)[2021-04-02]. https://arxiv.org/abs/1801.01290. [56] Brockman G, Cheung V, Pettersson L, et al. OpenAI Gym[J].(2016-06-05)[2021-04-02]. https://arxiv.org/abs/1606.01540. [57] Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods[DB/OL].(2018-10-22)[2021-04-02]. https://arxiv.org/abs/1802.09477. [58] Kalweit G, Boedecker J. Uncertainty-driven imagination for continuous deep reinforcement learning[J]. Proceedings of Machine Learning Research, 2017, 78:195-206. [59] Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[J]. Proceedings of Machine Learning Research, 2015, 37:1889-1897. [60] Kurutach T, Clavera I, Duan Y, et al. Model-ensemble trustregion policy optimization[DB/OL].(2018-10-05)[2021-04-02]. https://arxiv.org/abs/1802.10592. [61] Luo Y, Xu H, Li Y, et al. Algorithmic framework for modelbased deep reinforcement learning with theoretical guarantees[DB/OL].(2021-02-15)[2021-04-02]. https://arxiv.org/abs/1807.03858. [62] Andrychowicz M, Wolski F, Ray A, et al. Hindsight experience replay[C]//Advances in Neural Information Processing Systems. La Jolla, USA:Neural Information Processing Systems Foundation, 2017:5048-5058. [63] Weber T, Racanière S, Reichert D P, et al. Imaginationaugmented agents for deep reinforcement learning[DB/OL].(2018-02-14)[2021-04-02]. https://arxiv.org/abs/1707.06203. [64] Nagabandi A, Kahn G, Fearing R S, et al. Neural network dynamics for model-based deep reinforcement learning with model-freefine-tuning[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2018:7559-7566. [65] Feinberg V, Wan A, Stoica I, et al. Model-based value estimation for efficient model-free reinforcement learning[DB/OL].(2018-02-28)[2021-04-02]. https://arxiv.org/abs/1803.00101. [66] Buckman J, Hafner D, Tucker G, et al. Sample-efficient reinforcement learning with stochastic ensemble value expansion[DB/OL].(2019-06-07)[2021-04-02]. https://arxiv.org/abs/1807.01675. [67] Kaiser L, Babaeizadeh M, Milos P, et al. Model-based reinforcement learning for Atari[DB/OL].(2020-02-19)[2021-04-02]. https://arxiv.org/abs/1903.00374. [68] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms[J]. Proceedings of Machine Learning Research, 2014, 32(1):387-395. [69] Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems. La Jolla, USA:Neural Information Processing Systems Foundation, 2000:1057-1063. [70] 多南讯,吕强,林辉灿,等.迈进高维连续空间:深度强化学习在机器人领域中的应用[J].机器人, 2019, 41(2):276-288. Duo N X, Lü Q, Lin H C, et al. Step into high-dimensional and continuous action space:A survey on applications of deep reinforcement learning to robotics[J]. Robot, 2019, 41(2):276-288. [71] Finn C, Tan X Y, Duan Y, et al. Deep spatial autoencoders for visuomotor learning[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2016:512-519. [72] Kalashnikov D, Irpan A, Pastor P, et al. Scalable deep reinforcement learning for vision-based robotic manipulation[C]//Conference on Robot Learning. Cambridge, USA:JMLR, 2018:651-673. [73] Zeng A, Song S, Welker S, et al. Learning synergies between pushing and grasping with self-supervised deep reinforcement learning[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2018:4238-4245. [74] Berscheid L, Meißner P, Kröger T. Robot learning of shifting objects for grasping in cluttered environments[DB/OL].(2019-07-25)[2021-04-02]. https://arxiv.org/abs/1907.11035. [75] Yu K T, Rodriguez A. Realtime state estimation with tactile and visual sensing for inserting a suction-held object[C]//IEEE/RSJ Internaitonal Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2018:1628-1635. [76] Viereck U, Pas A, Saenko K, et al. Learning a visuomotor controller for real world robotic grasping using simulated depth images[J].(2017-11-17)[2021-04-02]. https://arxiv.org/abs/1706.04652. [77] Calandra R, Owens A, Jayaraman D, et al. More than a feeling:Learning to grasp and regrasp using vision and touch[J]. IEEE Robotics and Automation Letters, 2018, 3(4):3300-3307. [78] Zhang F Y, Leitner J, Milford M, et al. Towards vision-based deep reinforcement learning for robotic motion control[C/OL]//Australasian Conference on Robotics and Automation. 2015. [2021-04-02]. https://www.araa.asn.au/acra/acra2015/papers/pap168.pdf. [79] Finn C, Tan X Y, Duan Y, et al. Learning visual feature spaces for robotic manipulation with deep spatial autoencoders[DB/OL].(2016-03-01)[2021-04-02]. https://arxiv.org/abs/1509.06113. [80] Mahler J, Matl M, Satish V, et al. Learning ambidextrous robot grasping policies[J]. Science Robotics, 2019, 4(26). DOI:10. 1126/scirobotics.aau4984. [81] Jiang R, Wang Z, He B, et al. Vision-based deep reinforcement learning for UR5 robot motion control[C]//IEEE International Conference on Consumer Electronics and Computer Engineering. Piscataway, USA:IEEE, 2021:246-250. [82] Popov I, Heess N, Lillicrap T, et al. Data-efficient deep reinforcement learning for dexterous manipulation[DB/OL].(2017-04-10)[2021-04-02]. https://arxiv.org/abs/1704.03073. [83] Jang E, Devin C, Vanhoucke V, et al. Grasp2Vec:Learning object representations from self-supervised grasping[DB/OL].(2018-11-19)[2021-04-02]. https://arxiv.org/abs/1811.06964. [84] Li Z J, Zhao T, Chen F, et al. Reinforcement learning of manipulation and grasping using dynamical movement primitives for a humanoidlike mobile manipulator[J]. IEEE/ASME Transactions on Mechatronics, 2017, 23(1):121-131. [85] Fang K, Zhu Y K, Garg A, et al. Learning task-oriented grasping for tool manipulation from simulated self-supervision[J]. International Journal of Robotics Research, 2020, 39(2/3):202-216. [86] Andrychowicz O M, Baker B, Chociej M, et al. Learning dexterous in-hand manipulation[J]. International Journal of Robotics Research, 2020, 39(1):3-20. [87] Akkaya I, Andrychowicz M, Chociej M, et al. Solving rubik's cube with a robot hand[DB/OL].(2019-10-16)[2021-04-02]. https://arxiv.org/abs/1910.07113. [88] Gupta A, Eppner C, Levine S, et al. Learning dexterous manipulation for a soft robotic hand from human demonstrations[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2016:3786-3793. [89] Nagabandi A, Konoglie K, Levine S, et al. Deep dynamics models for learning dexterous manipulation[C]//Conference on Robot Learning. Cambridge, USA:JMLR, 2020:1101-1112. [90] Yahya A, Li A, Kalakrishnan M, et al. Collective robot reinforcement learning with distributed asynchronous guided policy search[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2017:79-86. [91] Chebotar Y, Kalakrishnan M, Yahya A, et al. Path integral guided policy search[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:3381-3388. [92] Urakami Y, Hodgkinson A, Carlin C, et al. DoorGym:A scalable door opening environment and baseline agent[DB/OL].(2020-05-13)[2021-04-02]. https://arxiv.org/abs/1908.01887. [93] Johannink T, Bahl S, Nair A, et al. Residual reinforcement learning for robot control[C]//International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2019:6023-6029. [94] Zakka K, Zeng A, Lee J, et al. Form2Fit:Learning shape priors for generalizable assembly from disassembly[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2020:9404-9410. [95] Li F M, Jiang Q, Zhang S S, et al. Robot skill acquisition in assembly process using deep reinforcement learning[J]. Neurocomputing, 2019, 345:92-102. [96] Khansari M, Kappler D, Luo J L, et al. Action image representation:Learning scalable deep grasping policies with zero real world data[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2020:3597-3603. [97] Murali A, Mousavian A, Eppner C, et al. 6-DOF grasping for target-driven object manipulation in clutter[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2020:6232-6238. [98] Matas J, James S, Davison A J. Sim-to-real reinforcement learning for deformable object manipulation[DB/OL].(2018-10-08)[2021-04-02]. https://arxiv.org/abs/1806.07851. [99] Li Y X, Yue Y H, Xu D F, et al. Folding deformable objects using predictive simulation and trajectory optimization[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2015:6000-6006. [100] Cusumano-Towner M, Singh A, Miller S, et al. Bringing clothing into desired configurations with limited perception[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2011:3893-3900. [101] Yamakawa Y, Namiki A, Ishikawa M. Motion planning for dynamic folding of a cloth with two high-speed robot hands and two high-speed sliders[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2011:5486-5491. [102] Maitin-Shepard J, Cusumano-Towner M, Lei J N, et al. Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2010:2308-2315. [103] Osawa F, Seki H, Kamiya Y. Unfolding of massive laundry and classification types by dual manipulator[J]. Journal of Advanced Computational Intelligence and Intelligent Informatics, 2007, 11(5):457-463. [104] Bersch C, Pitzer B, Kammel S. Bimanual robotic cloth manipulation for laundry folding[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2011:1413-1419. [105] Tsurumine Y, Cui Y, Uchibe E, et al. Deep reinforcement learning with smooth policy update:Application to robotic cloth manipulation[J]. Robotics and Autonomous Systems, 2019, 112:72-83. [106] Wu Y L, Yan W, Kurutach T, et al. Learning to manipulate deformable objects without demonstrations[C]//Robotics:Science and Systems. Cambridge, USA:MIT Press, 2020. DOI:10.15607/RSS.2020.XVI.065. [107] Seita D, Florence P, Tompson J, et al. Learning to Rearrange deformable cables, fabrics, and bags with goal-conditioned transporter networks[DB/OL].(2021-03-26)[2021-04-02]. https://arxiv.org/abs/2012.03385. [108] Lin X, Wang Y, Olkin J, et al. SoftGym:Benchmarking deep reinforcement learning for deformable object manipulation[DB/OL].(2021-03-08)[2021-04-02]. https://arxiv.org/abs/2011.07215. [109] Sajjan S, Moore M, Pan M, et al. Clear grasp:3D shape estimation of transparent objects for manipulation[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2020:3634-3642. [110] Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10):1345-1359. [111] Lin Y C, Zeng A, Song S R, et al. Learning to see before learning to act:Visual pre-training for manipulation[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2020:7286-7293. [112] Rusu A A, VeČerÍk M, Rothörl T, et al. Sim-to-real robot learning from pixels with progressive nets[J]. Proceedings of Machine Learning Research, 2017, 78:262-270. [113] Rusu A A, Rabinowitz N C, Desjardins G, et al. Progressive neural networks[DB/OL].(2016-09-07)[2021-04-02]. https://arxiv.org/abs/1606.04671. [114] Gupta A, Devin C, Liu Y X, et al. Learning invariant feature spaces to transfer skills with reinforcement learning[DB/OL].(2017-03-08)[2021-04-02]. https://arxiv.org/abs/1703.02949. [115] Wang M, Deng W H. Deep visual domain adaptation:A survey[J]. Neurocomputing, 2018, 312:135-153. [116] Peng X B, Andrychowicz M, Zaremba W, et al. Sim-toreal transfer of robotic control with dynamics randomization[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2018:3803-3810. [117] Tobin J, Biewald L, Duan R, et al. Domain randomization and generative models for robotic grasping[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2018:3482-3489. [118] James S, Wohlhart P, Kalakrishnan M, et al. Sim-to-real via sim-to-sim:Data-efficient robotic grasping via randomized-tocanonical adaptation networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2019:12627-12637. [119] Hundt A, Killeen B, Greene N, et al. "Good robot!":Efficient reinforcement learning for multi-step visual tasks with sim to real transfer[J]. IEEE Robotics and Automation Letters, 2020, 5(4):6724-6731. [120] Pedersen O M, Misimi E, Chaumette F. Grasping unknown objects by coupling deep reinforcement learning, generative adversarial networks, and visual servoing[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2020:5655-5662. [121] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2017:2223-2232. [122] Mahmood A R, Korenkevych D, Komer B J, et al. Setting up a reinforcement learning task with a real-world robot[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA:IEEE, 2018:4635-4640.