Abstract:A path-planning algorithm based on hierarchical reinforcement learning is presented.Since the reinforcement learning approach is introduced,the algorithm is provided with the capability of learning without environment model.The hierarchical reinforcement learning method is mainly employed for updating local strategies.So,this algorithm can eliminate its dependence on the static information of the global environment or the moving information of the dynamic obstacles. Simulation experiment shows the feasibility of the algorithm.Although there is no obvious advantage in planning speed,the learning ability of the algorithm in unknown dynamic environment is unique.
[1] Lozano-Perez T,Wesley M A.An algorithm for planning collisionfree paths among polyhedral obstacles[J].Communications of the ACM,1979,22(10):560-570. [2] Khatib O.Real-time obstacle avoidance for robot manipulator and mobile robots[J].The International Journal of Robotics Research,1986,5(1):90-98. [3] Wang C-M,Soh Y C,Wang H,et al.A hierarchical genetic algorithm for path planning in a static environment with obstacles[A].Proceedings of the 2002 IEEE Canadian Conference on Electrical and Computer Engineering[C].Piscataway,USA:IEEE,2002.1652-1657. [4] D'Amico A,Ippoliti G,Longhi S A.Radial basis function networks approach for the tracking problem of mobile robots[A].Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics[C].Piscataway,USA:IEEE,2001.498-503. [5] Bruce J,Veloso M.Real-time randomized path planning for robot navigation[A].Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems[C].Piscataway,USA:IEEE,2002.2383-2388. [6] 张汝波,杨广铭,顾国昌,等.Q-学习及其在智能机器人局部路径规划中的应用研究[J].计算机研究与发展,1999,36(12):1430-1436. [7] 张纯刚,席裕庚.全局环境未知时基于滚动窗口的机器人路径规划[J].中国科学(E辑),2001,31(1):51-58. [8] 张纯刚,席裕庚.一类动态不确定环境下机器人的滚动路径规划[J].自动化学报,2002,28(2):161-174. [9] 朱庆保.动态复杂环境下的机器人路径规划蚂蚁预测算法[J].计算机学报,2005,28(11):1898-1906. [10] 朴松昊,洪炳镕.一种动态环境下移动机器人的路径规划方法[J].机器人,2003,25(1):18-21,43. [11] 韩学东,洪炳镕,孟伟.一种新型的路径规划方法——人工水流法[J].高技术通讯,2004,14(4):53-57. [12] 谢宏斌,刘国栋,李春光.动态环境中基于模糊神经网络的机器人路径规划的一种新方法[J].江南大学学报(自然科学版),2003,2(1):20-23,27. [13] 刘国栋,谢宏斌,李春光.动态环境中基于遗传算法的移动机器人路径规划的方法[J].机器人,2003,25(4):327-330,343. [14] 覃柯,孙茂相,孙昌志.动态环境下基于改进人工势场法的机器人运动规划[J].沈阳工业大学学报,2004,26(5):568-571,582. [15] 徐潼,唐振民.动态环境中的移动机器人避碰规划研究[J].机器人,2003,25(2):117-122,139. [16] 张汝波.强化学习理论及应用[M].哈尔滨:哈尔滨工程大学出版社,2001. [17] Barto A G,Mahadevan S.Recent advances in hierarchical reinforcement learning[J].Discrete Event Dynanic Systems:Theory and Applications,2003,13(1-2):4-77. [18] Sutton R S,Precup D,Singh S P.Between MDPs and semiMDPs:a framework for temporal abstraction in reinforcement learning[J].Artificial Intelligence,1999,112(1):18-211. [19] Parr R.Hierarchical Control and Learning for Markov Decision Processes[D].Berkeley:University of Calilornia,1998. [20] Dietterich T G.Hierarchical reinforcement learning with the MAXQ value function decomposition[J].Journal of Artificial Intelligence Research,2000,13 (1):227-303. [21] Korf R E.Learning to Solve Problems by Searching for Macro-operators[M].London:Pitman Publishing Ltd.,1985. [22] Precup D.Temporal Abstraction in Reinforcement Learning[D].Amherst:University of Massachusetts,2000. [23] Digney B L.Learning hierarchical control structures for multiple tasks and changing environments[A].From Animals to Animats 5:Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior[C].Cambridge,USA:MIT Press,1998.321-330. [24] McGovem A,Barto A.Automatic discovery of subgoals in rein-forcement learning using diverse density[A].Proceedings of the 8th International Conference on Machine Learning[C].San Fran-sisco:Morgan Kaufmann,2001.361-368. [25] Menache I,Mannor S,Shimkin N,et al.Q-cut:dynamic discovcry of sub-goals in reinforcement learning[A].Proceedings of the 13th European Conference on Machine Learning[C].Berlin,Germany:Springer-Verlag,2002.295-306. [26] Lin L G.Self-improving reactive agents based on reinforcement learning,planning and teaching[J].Machine Learning,1992,8 (3):293-321.