Abstract:A distributed kernel-based reinforcement learning method is proposed to optimize the multi-robot formation control.Firstly,the basic formation control is realized based on a distributed leader-follower strategy by adding a virtualleader -robot.Secondly,a kernel-based reinforcement learning method,which combines the least squares policy iteration with the least squares policy evaluation,is proposed.The kernel-based least squares policy iteration method is used to obtain an initial formation optimal control policy offline,and then the kernel-based least squares policy evaluation method is used to optimize the control policy online.Finally,the experimental results for formation control show that the proposed method can optimize the control policy adaptively and improve the multi-robot formation control performance.
[1] Arai T,Pagello E,Parker L E.Guest editorial advances in multirobot systems[J].IEEE Transactions on Robotics and Automation,2002,18(5):655-661.
[2] Kumar V,Belta C.Abstraction and control for groups of robots[J].IEEE Transactions on Robotics,2004,20(5):865-875.
[3] Tuci E,Groβ R,Trianni V,et al.Cooperation through selfassembly in multi-robot systems[J].ACM Transactions on Autonomous and Adaptive Systems,2006,1(2):115-150.
[4] Raymond J S,Rimey R D,Munkeby S H.Overview of the UGV/Demo Ⅱ program[R].USA:Lockheed Martin Astronautics,1997.
[5] Antonelli G,Arrichiello F,Chiaverini S.The null-space-based behavioral control for autonomous robotic systems[J].Journal of Intelligent Service Robotics,2008,1 (1):27-39.
[6] Nissan.Nissan EPORO robot car "'Goes to school" on collisionfree driving by mimicking fish behavior[N/OL].[2010-08-25].http://www.nissan-global.com/ENINEWS/2009/-STORY/091001-01-e.html.
[7] Balch R,Arkin R C.Behavior-based formation control for multirobot teams[J].IEEE Transactions on Robotics and Automation,1998,14(6):926-939.
[8] Dunbar W B,Murray R M.Distributed receding horizon control for multi-vehicle formation stabilization[J].Automatica,2006,42(4):549-558.
[9] Lewis M A,Tan K H.High precision formation control of mobile robots using virtual structures[J].Autonomous Robots,1997,4(4):387-403.
[10] 董胜龙,陈卫尔,席裕庚.多移动机器人编队的分布式控制系统[J].机器人,2000,22(6):433-438.Dong S L,Chen W D,Xi Y G.An distributed control system for multi-mobile robots formation[J].Robot,2000,22(6):433-438.
[11] 孟宪松,徐宏根,张铭钧,等.基于分解策略的多机器人编队控制方法[J].哈尔滨工程大学学报,2006,27(2):276-280.Meng X S,Xu H G,Zhang M J,et al.Formation control for multiple robots based on a strategy of decomposition[J].Journal of Harbin Engineering University,2006,27(2):276-280.
[12] 俞辉,王永骥,徐建省.非完整移动机器人编队的滑模控制[J].机器人,2006,28(4):428-432.Yu H,Wang Y J,Xu J S.Sliding mode control for nonholonomic mobile robot formation[J].Robot,2006,28(4):428-432.
[13] Yamaguchi H.A cooperative hunting behavior by mobile-robot troops[J].International Journal of Robotics Research,1999,18(9):931-940.
[14] Desai J P,Ostrowski J P,Kumar V.Modeling and control of formations of nonholonomic mobile robots[J].IEEE Transactions on Robotics and Automation,2001,17(6):905-908.
[15] 张磊,秦元庆,孙德宝,等.基于行为的多机器人任意队形的控制[J].控制工程,2005,12(2):174-176.Zhang L,Qin Y Q,Sun D B,et al.Behavior-based control for arbitrary formation of multiple robots[J].Control Engineering of China,2005,12(2):174-176.
[16] Sutton R S,Barto A G.Reinforcement learning:An introduction[M].Cambridge,MA,USA:MIT Press,1998.
[17] Xu X,Xie T,Hu D W,et al.Kernel least-squares temporal difference learning[J].International Journal of Information Technology,2005,11(9):54-63.
[18] Xu X,Hu D W,Lu X C.Kernel-based least squares policy iteration for reinforcement learning[J].IEEE Transactions on Neural Networks,2007,18(4):973-992.
[19] Scholkopf B,Smola A J.Learning with kernels[M].Cambridge,MA,USA:MIT Press,2002.
[20] Smola A J,Scholkopf B.Sparse greedy matrix approximation for machine learning[C]//17th International Conference on Machine Learning.San Francisco,CA,USA:Morgan Kaufmann,2000:911-918.
[21] Engel Y,Mannor S,Meir R.Bayes meets Bellman:The Gaussian process approach to temporal difference leaming[C]//12th International Conference on Machine Learning.Menlo Park,CA,USA:AAAI,2003:154-161.
[22] Jung T,Polani D.Learning RoboCup-Keepaway with Kernels[C]//JMLR Workshop and Conference Proceedings.2007:33-57.