Q-Learning Algorithm Based on Incremental RBF Network
HU Yanming1,2,3, LI Decai1,2, HE Yuqing1,2, HAN Jianda1,2,4
1. The State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China;
2. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China;
3. University of Chinese Academy of Sciences, Beijing 100049, China;
4. College of Artificial Intelligence, Nankai University, Tianjing 300071, China
Abstract:An IRBFN (incremental radial basis function network) based Q-learning (IRBFN-QL) algorithm is proposed to upgrade the behavioural intelligence of robots. The key is to learn and store Q-value function based on adaptive growth of the structure and online learning of the parameters, to make robots learn the behavioral strategy autonomously and incrementally in unknown environment. Firstly, approximate linear independence (ALD) criterion is used to online increase the network nodes, thus the memory capacity of robots can grow adaptively along with the expansion of state space. The new added nodes change the inner connection of network topology. Kernel recursive least square (KRLS) algorithm is used to update the connection of network topology and its parameters, therefore the robot can extend and optimize its behavioral strategy constantly. Besides, L2 regularization term is integrated to KRLS algorithm to avoid the overfitting problem, which forms the L2 constrained KRLS (L2KRLS) algorithm. The experimental results show that IRBFN-QL algorithm can realize autonomous interaction between the robot and the unknown environment and gradually improve the navigation behavior ability of mobile robot in corridor environments.
[1] Gardner H. The theory of multiple intelligences[J]. Annals of Dyslexia, 1987, 37(1):19-35.
[2] Sutton R S, Barto A G. Reinforcement learning:An introduction[M]. Cambridge, USA:MIT Press, 2018.
[3] Kober J, Bagnell J A, Peters J. Reinforcement learning in robotics:A survey[J]. International Journal of Robotics Research, 2013, 32(11):1238-1274.
[4] Watkins C J C H, Dayan P. Q-learning[J]. Machine learning, 1992, 8(3/4):279-292.
[5] Duguleana M, Mogan G. Neural networks based reinforcement learning for mobile robots obstacle avoidance[J]. Expert Systems with Applications, 2016, 62:104-115.
[6] Mohanty P K, Sah A K, Kumar V, et al. Application of deep Q-learning for wheel mobile robot navigation[C]//3rd International Conference on Computational Intelligence and Networks. Piscataway, USA:IEEE, 2017:88-93.
[7] Gu S X, Holly E, Lillicrap T, et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:3389-3396.
[8] Gu S X, Lillicrap T, Sutskever I, et al. Continuous deep Q-learning with model-based acceleration[C]//33rd International Conference on Machine Learning. New York, USA:ACM, 2016:2829-2838.
[9] Ferrer G J. Encoding robotic sensor states for Q-learning using the self-organizing map[J]. Journal of Computing Sciences in Colleges, 2010, 25(5):133-139.
[10] Hu Q X, Qu X Y. Mobile robot navigation using ARTQL algorithm with novelty driven mechanism[J]. Applied Mechanics and Materials, 2013, 380-384(4):1117-1120.
[11] 胡启祥,瞿心昱.内部动机驱动的机器人未知环境在线自主学习[J].计算机工程与应用,2014,50(4):110-113. Hu Q X, Qu X Y. Internal motivation driven robot online and autonomous learning of unknown environment[J]. Computer Engineering and Applications, 2014, 50(4):110-113.
[12] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[13] Goodfellow I J, Mirza M, Xiao D, et al. An empirical investigation of catastrophic forgetting in gradient-based neural networks[C]//International Conference on Learning Representations. 2014. https://arxiv.org/abs/1312.6211v2.
[14] Simon H. Neural networks and learning machines[M]. Beijing, China:China Machine Press, 2009.
[15] Song T H, Li D Z. Online L2-regularized reinforcement learning via RBF neural network[C]//28th Chinese Control and Decision Conference. Piscataway, USA:IEEE, 2016:6627-6632.
[16] Liu W F, Principe J C, Haykin S. Kernel adaptive filtering:A comprehensive introduction[M]. Hoboken, USA:John Wiley & Sons., 2010.
[17] Singh S, Póczos B, Ma J. Minimax reconstruction risk of convolutional sparse dictionary learning[C]//21st International Conference on Artificial Intelligence and Statistics. Trier, Germany:DBLP, 2018:1327-1336.
[18] Shi X D, Xiong W L. Approximate linear dependence criteria with active learning for smart soft sensor design[J]. Chemometrics and Intelligent Laboratory Systems, 2018, 180:88-95.
[19] Scharf L. Statistical signal processing[M]. New York, USA:Springer, 2002.
[20] Zhang C, Woodland P C. DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, USA:IEEE, 2016:5300-5304.
[21] Kingma D P, Ba J. Adam:A method for stochastic optimization[C]//3rd International Conference for Learning Representations. 2015. https://arxiv.org/abs/1412.6980.