A Reinforcement Learning Method for Gliding Control of Underwater Gliding Snake-like Robot
ZHANG Xiaolu1,2,3, LI Bin2,3, CHANG Jian2,3, TANG Jingge2,3,4
1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
2. The State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China;
3. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China;
4. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:A reinforcement learning algorithm for gliding control of underwater gliding snake-like robot is studied. To solve the problem that the hydrodynamic environment is hard to be modeled, a reinforcement learning method is adopted so that the underwater gliding snake-like robot can adapt to the complex water environment and automatically learn the gliding actions only by adjusting buoyancy. A Monte Carlo policy gradient algorithm using recurrent neural network is proposed to solve the problem that the algorithm is difficult to train because the robot state can't be fully observed. The gliding action control of the underwater gliding snake-like robot is approximated as Markov decision processes (MDPs), so as to obtain an effective gliding control policy. Simulation and experiment results show the effectiveness of the proposed method.
[1] Javaid M Y, Ovinis M, Nagarajan T, et al. Underwater gliders:A review[C]//4th International Conference on Production, Energy and Reliability. Paris, France:EDP Sciences, 2014:No.02020.
[2] 俞建成,张奇峰,吴利红,等.水下滑翔机器人运动调节机构设计与运动性能分析[J].机器人, 2005, 27(5):390-395. Yu J C, Zhang Q F, Wu L H, et al. Movement mechanism design and motion performance analysis of an underwater glider[J]. Robot, 2005, 27(5):390-395.
[3] Ming A G, Ichikawa T, Zhao W J, et al. Development of a sea snake-like underwater robot[C]//IEEE International Conference on Robotics and Biomimetics. Piscataway, USA:IEEE, 2014:761-766.
[4] 李立,王明辉,李斌,等.蛇形机器人水下3D运动建模与仿真[J].机器人, 2015, 37(3):336-342. Li L, Wang M H, Li B, et al. Modeling and simulation of snake robot in 3D underwater locomotion[J]. Robot, 2015, 37(3):336-342.
[5] 唐敬阁,李斌,李志强,等.水下蛇形机器人的滑翔运动性能研究[J].高技术通讯, 2017, 27(3):269-276. Tang J G, Li B, Li Z Q, et al. Research on the gliding performance of underwater snake-like robots[J]. High Technology Letters, 2017, 27(3):269-276.
[6] Sutton R S, Barto A G. Reinforcement learning:An introduction[M]. Cambridge, USA:MIT Press, 1998.
[7] Gläscher J, Daw N, Dayan P, et al. States versus rewards:Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning[J]. Neuron, 2010, 66(4):585-595.
[8] Puterman M L. Markov decision processes:Discrete stochastic dynamic programming[M]. Hoboken, USA:John Wiley & Sons, Inc., 1994.
[9] Monahan G E. State of the art-A survey of partially observable Markov decision processes-Theory, models, and algorithms[J]. Management Science, 1982, 28(1):1-16.
[10] Wang X N, He H G, Xu X. Reinforcement learning algorithm for partially observable Markov decision processes[J]. Control and Decision, 2004, 19(11):1263-1266.
[11] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[12] 郁树梅,马书根,李斌,等.水陆两栖蛇形机器人的上浮和下潜步态研究[J].仪器仪表学报, 2011, 32(S1):276-279. Yu S M, Ma S G, Li B, et al. Study on the floatation and submergence gait of amphibious snake-like robot[J]. Chinese Journal of Scientific Instrument, 2011, 32(S1):276-279.
[13] Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable MDPs[C]//AAAI Fall Symposium. El Segundo, USA:AI Access Foundation, Computer Science, 2015:29-37.