Q-Learning Algorithm Based on Incremental RBF Network
HU Yanming1,2,3, LI Decai1,2, HE Yuqing1,2, HAN Jianda1,2,4
1. The State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China;
2. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China;
3. University of Chinese Academy of Sciences, Beijing 100049, China;
4. College of Artificial Intelligence, Nankai University, Tianjing 300071, China
Abstract:An IRBFN (incremental radial basis function network) based Q-learning (IRBFN-QL) algorithm is proposed to upgrade the behavioural intelligence of robots. The key is to learn and store Q-value function based on adaptive growth of the structure and online learning of the parameters, to make robots learn the behavioral strategy autonomously and incrementally in unknown environment. Firstly, approximate linear independence (ALD) criterion is used to online increase the network nodes, thus the memory capacity of robots can grow adaptively along with the expansion of state space. The new added nodes change the inner connection of network topology. Kernel recursive least square (KRLS) algorithm is used to update the connection of network topology and its parameters, therefore the robot can extend and optimize its behavioral strategy constantly. Besides, L2 regularization term is integrated to KRLS algorithm to avoid the overfitting problem, which forms the L2 constrained KRLS (L2KRLS) algorithm. The experimental results show that IRBFN-QL algorithm can realize autonomous interaction between the robot and the unknown environment and gradually improve the navigation behavior ability of mobile robot in corridor environments.
胡艳明, 李德才, 何玉庆, 韩建达. 基于增量式RBF网络的Q学习算法[J]. 机器人, 2019, 41(5): 562-573.
HU Yanming, LI Decai, HE Yuqing, HAN Jianda. Q-Learning Algorithm Based on Incremental RBF Network. ROBOT, 2019, 41(5): 562-573.
Gardner H. The theory of multiple intelligences[J]. Annals of Dyslexia, 1987, 37(1):19-35.
[2]
Sutton R S, Barto A G. Reinforcement learning:An introduction[M]. Cambridge, USA:MIT Press, 2018.
[4]
Watkins C J C H, Dayan P. Q-learning[J]. Machine learning, 1992, 8(3/4):279-292.
[21]
Kingma D P, Ba J. Adam:A method for stochastic optimization[C]//3rd International Conference for Learning Representations. 2015. https://arxiv.org/abs/1412.6980.
[19]
Scharf L. Statistical signal processing[M]. New York, USA:Springer, 2002.
[6]
Mohanty P K, Sah A K, Kumar V, et al. Application of deep Q-learning for wheel mobile robot navigation[C]//3rd International Conference on Computational Intelligence and Networks. Piscataway, USA:IEEE, 2017:88-93.
[8]
Gu S X, Lillicrap T, Sutskever I, et al. Continuous deep Q-learning with model-based acceleration[C]//33rd International Conference on Machine Learning. New York, USA:ACM, 2016:2829-2838.
[10]
Hu Q X, Qu X Y. Mobile robot navigation using ARTQL algorithm with novelty driven mechanism[J]. Applied Mechanics and Materials, 2013, 380-384(4):1117-1120.
[16]
Liu W F, Principe J C, Haykin S. Kernel adaptive filtering:A comprehensive introduction[M]. Hoboken, USA:John Wiley & Sons., 2010.
[18]
Shi X D, Xiong W L. Approximate linear dependence criteria with active learning for smart soft sensor design[J]. Chemometrics and Intelligent Laboratory Systems, 2018, 180:88-95.
[7]
Gu S X, Holly E, Lillicrap T, et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2017:3389-3396.
[12]
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[3]
Kober J, Bagnell J A, Peters J. Reinforcement learning in robotics:A survey[J]. International Journal of Robotics Research, 2013, 32(11):1238-1274.
[5]
Duguleana M, Mogan G. Neural networks based reinforcement learning for mobile robots obstacle avoidance[J]. Expert Systems with Applications, 2016, 62:104-115.
[9]
Ferrer G J. Encoding robotic sensor states for Q-learning using the self-organizing map[J]. Journal of Computing Sciences in Colleges, 2010, 25(5):133-139.
[11]
胡启祥,瞿心昱.内部动机驱动的机器人未知环境在线自主学习[J].计算机工程与应用,2014,50(4):110-113. Hu Q X, Qu X Y. Internal motivation driven robot online and autonomous learning of unknown environment[J]. Computer Engineering and Applications, 2014, 50(4):110-113.
[15]
Song T H, Li D Z. Online L2-regularized reinforcement learning via RBF neural network[C]//28th Chinese Control and Decision Conference. Piscataway, USA:IEEE, 2016:6627-6632.
[17]
Singh S, Póczos B, Ma J. Minimax reconstruction risk of convolutional sparse dictionary learning[C]//21st International Conference on Artificial Intelligence and Statistics. Trier, Germany:DBLP, 2018:1327-1336.
[13]
Goodfellow I J, Mirza M, Xiao D, et al. An empirical investigation of catastrophic forgetting in gradient-based neural networks[C]//International Conference on Learning Representations. 2014. https://arxiv.org/abs/1312.6211v2.
[14]
Simon H. Neural networks and learning machines[M]. Beijing, China:China Machine Press, 2009.
[20]
Zhang C, Woodland P C. DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, USA:IEEE, 2016:5300-5304.