Abstract：In this paper, a new reinforcement learning(CVRL) algorithm with continuous vector output is proposed as to the navigation problem of mobile robot. CVRL is hierarchically structured. The lower layer is composed with several groups of unit actions and real-valued vector output can be produced based on action combination. The higher layer is a Q-Learning unit defined on the space of combined action, its responsibility is the selection of a proper combined actions. The detailed implementation of the CVRL navigation controller is given, and the simulation results demonstrate its effectiveness.
 Aluizio F R Araujo。 Reward-Penalty Reinforcement Learning for Planning and Reactive Behavior. In Proc 1998 IEEEInter Conf Robotics and Automation, 1485-1490.  Rummery G. On-line Q-learning Using Connectionist System. Technical Report CUED/F-INFENG/TR166,Cambridge University Engineering Department, UK, 1994.  Jose del R Millan. Rapid, Safe, and Incremental Learning of Navigation strategies. IEEE Trans, S M C, 1996,26(3):408-420.  Sutton R S, Barto A G. Reinforcement Learning:An Introduction. MIT Press, Cambridge, MA, 1998.  Pawel Cichosz. Truncating Temporal Differences:On the Efficient Implementation of for Reinforcement Learning. JArtificial Intelligence Research, 1995, 2:287-318.  Watkins, C J C H, Dynan P. Q-Learnig. Machine Learning, 1992, 8:279-292.  Sutton R S. Generalization in Reinforcement Learning:Successful Example Using Sparse Coarse Coding. Advances inNeural Information Processing System, 1996, 8:1038-1044.  Anderson C W. Strategy Learning with Multilayer Connectionist Representations. http://www, cs. colostate, com /～anderson