A NEW MULTI-AGENT REINFORCEMENT LEARNING ALGORITHM AND ITS APPLICATION TO MULTI-ROBOT COOPERATION TASKS
GU Guo-chang1, ZHONG Yu1, ZHANG Ru-bo1,2
1.School of Computer Science and Technology,Harbin Engineering University, Harbin. 150001; 2. Robotics Laborat ory, Shenyang Institute of Automati on,Chinese Academy of Sciences, Shenyang 110016, China
Abstract:In multi-robot systems, joint-action must be employed to achieve cooperation because the evaluation to the behavior of a robot often depends on the other robots'behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by joint-action. In this paper, a prediction-based reinforcement learning algorithm is presented for multi-robot cooperation tasks, which demands all robots to learn paper predict the probabilities of actions that other robots may execute. A multi-robot cooperation experiment is made to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive reinforcement learning algorithm.
[1] 张汝波, 顾国昌, 刘照德, 王醒策. 强化学习理论、算法及应用[J]. 控制理论与应用. 2000, 17(5):637-642. [2] Sutton R S. Learning to predict by the methods of temporal difference[J]. Machine Learning. 1988,(3):9-44. [3] Watkins J C H, Dayan Peter. Q-learning[J]. Machine Learning. 1992,(8):279-292. [4] Sutton R S. Temporal credit assignment in reinforcement learning[D]. University of Massachusetts,Amherst,MA,1984. [5] Masayuki Yamamura, Takashi Onozuka. Reinforcement learning with knowledge by using a stochastic gradient method on a bayesian network[A]. Proceedings of the 1998 IEEE International Conference on Neural Networks[C]. May 4-9 1998. Anchorage, Alaska, USA:2045-2050. [6] Carlos H C Ribeiro. Embedding a priori knowledge in reinforcement learning[J]. Journal of Intelligent and Robotic Systems. 1998,21:51-71. [7] Chi-Hyon Oh, Tomoharu Nakashima, Hisao Ishibuchi. Initialization of Q-values by fuzzy rules for accelerating Q-learning[A]. Proceedings of the 1998 IEEE International Conference on Neural Networks[C]. May 4-9 1998. Anchorage, Alaska, USA:2051-2056 [8] Dean F Hougen, Maria Gini, James Slagle. Partitioning input space for reinforcement learning for control[A]. Proceedings of the 1997 IEEE International Congress on Neural Networks[C]. June 9-12, 1997. Houston, TX, USA:755-760. [9] Yoshikazu Arai, Teruo Fujii, Hajime Asama, Yasushi Kataoka. Multilayered reinforcement learning for complicated collision avoidance problems[A]. Proceedings of the 1998 IEEE International Conference on Robotics & Automation[C]. May 16-20, 1998. Leuven, Belgium:2186-2191. [10] John W Sheppard. Colearning in Differential Games[J]. Machine Learning. 1998,(33):201-233. [11] Michael L Littman. Markov games as a framework for multiagent reinforcement learning[A]. Proceedings of the 11th International Conference on Machine Learning[C]. 1994:157-163. [12] Littman M, Szepesvari C. A generalized reinforcement learning model:Convergence and applications[A]. Proceedings of the 13th International Conference on Machine Learning[C]. Bari, Italy. July 3-6, 1996:310-318. [13] Junling Hu, Michael P Wellman. Multiagent reinforcement learning:Theoretical framework and an algorithm[A]. Proceedings of the 15th International Conference of Machine Learning[C]. July 24-27, 1998. Madison Wisconsin:115-122