Robotic Episodic Learning and Behaviour Control Integrated with Neuron Stimulation Mechanism
LIU Dong1, CONG Ming1, GAO Sen1, HAN Xiaodong1, DU Yu2
1. School of Mechanical Engineering, Dalian University of Technology, Dalian 116024, China;
2. Department of Mechanical Engineering, University of British Columbia, Vancouver V6T1Z4, Canada
There are problems of curse of dimensionality and perceptual aliasing in robot behaviour control under uncertainty. To solve the problem, a framework called episodic memory-driving Markov decision process (EM-MDP) is proposed by introducing neuron stimulation mechanism, in order to achieve environmental experience self-learning and behaviour control under multi-source uncertainty. Firstly, an episodic memory model is built, and an activation and organization mechanism of state neurons is proposed based on cognitive neuroscience. Secondly, self-learning of episodic memory is realized by utilizing adaptive resonance theory (ART) and sparse distributed memory (SDM) through Hebbian rules. A robot behaviour control strategy is established by neuron synaptic potential. Robot can evaluate the past events sequence, predict the current state and plan the desired behaviour. Finally, the experimental results show that the model and control strategy can achieve the objectives of robot behaviour control in universal scenes.
[1] 钱堃,马旭东,戴先中,等.预测行人运动的服务机器人POMDP导航[J].机器人,2010,32(1):18-24,33.Qian K, Ma X D, Dai X Z, et al. POMDP navigation of service robots with human motion prediction[J]. Robot, 2010, 32(1): 18-24,33.[2] Shani G, Brafman R I, Shimony S E. Forward search value iteration for POMDPs[C]//International Joint Conference on Artificial Intelligence. USA: International Joint Conferences on Artificial Intelligence, 2007: 2619-2624.[3] Roy N, Gordon G, Thrun S. Finding approximate POMDP solutions through belief compression[J]. Journal of Artificial Intelligence Research, 2005, 23: 1-40.[4] Kurniawati H, Hsu D, Lee W S. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces[C]//Robotics: Science and Systems. Zurich, Switzerland, 2008.[5] Wei J Q, Dolan J M, Snider J M, et al. A point-based MDP for robust single-lane autonomous driving behavior under uncertainties[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2011: 2586-2592.[6] Theocharous G, Mahadevan S. Approximate planning with hierarchical partially observable Markov decision process models for robot navigation[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2002: 1347-1352.[7] Ong S C W, Png S W, Hsu D, et al. Planning under uncertainty for robotic tasks with mixed observability[J]. International Journal of Robotics Research, 2010, 29(5): 1053-1068.[8] 王学宁,贺汉根,徐昕.求解部分可观测马氏决策过程的强化学习算法[J].控制与决策,2004,19(8):1263-1266.Wang X N, He H G, Xu X. Reinforcement learning algorithm for partially observable Markov decision processes[J]. Control and Decision, 2004, 19(8): 1263-1266.[9] Jockel S, Westhoff D, Zhang J W. EPIROME -- A novel framework to investigate high-level episodic robot memory[C]//IEEE International Conference on Robotics and Biomimetics. Piscataway, USA: IEEE, 2007: 1075-1080.[10] Endo Y. Anticipatory robot control for a partially observable environment using episodic memories[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2008: 2852-2859.[11] Stachowicz D, Kruijff G J M. Episodic-like memory for cognitive robots[J]. IEEE Transactions on Autonomous Mental Development, 2012, 4(1): 1-16. [12] 夏旻,翁理国,张颖超.基于神经元协同激励的稳定时间可控情景记忆[J].系统仿真学报,2011,23(7):2134-2137.Xia M, Weng L G, Zhang Y C. Episodic memory with controllable steady-state period using coherent spin-interaction neural networks[J]. Journal of System Simulation, 2011, 23(7): 2134-2137.[13] 林龙年,Osan R,Shoham S,等.小鼠海马神经网络对情景体验进行实时编码的功能单元的发现与鉴别[J].华东师范大学学报:自然科学版,2005(5/6):208-216.Lin L N, Osan R, Shoham S, et al. Discovery and identification for the functional units of real-time encoding on episodes experience of mouse hippocampal neural networks[J]. Journal of East China Normal University: Natural Science, 2005(5/6): 208-216.[14] Itti L, Koch C. A saliency based search mechanism for overt and covert shifts of visual attention[J]. Vision Research, 2000, 40(10-12): 1489-1506. [15] Liu J, Cai Z X, Tu C M. A connectionist approach for cognitive map learning and navigation based on spatio-temporal experiences[J]. Control Theory and Applications, 2003, 20(2): 161-167.[16] Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. Cambridge, USA: MIT Press, 1998.