在线深度强化学习探索策略生成方法综述

Exploratory Policy Generation Methods in On-line Deep Reinforcement Learning: A Survey

  • 摘要: 针对在线深度强化学习算法训练过程中的探索-利用难题,在对其概要介绍基础上,从探索策略与任务策略的关系角度入手,对单智能体在线深度强化学习算法中的探索策略生成方法进行分类综述。首先重点介绍了基于任务策略奖励空间与参数空间的探索策略生成方法,对在奖励空间中引入内在激励的探索方法进行了分类介绍并结合优缺点分析给出了相关研究进展;结合任务性能和多样性需求,对参数空间神经进化算法中的个体适应度函数表征方法进行了详细分析。随后,对动作空间探索和参数空间探索相结合的思路与方法进行了综述分析,并对高层任务目标空间和任务无关探索策略生成方法进行了介绍。最后,对探索策略安全约束处理方法进行了分类讨论,并给出了探索策略生成面临的难题与下一步研究方向。

     

    Abstract: After a brief overview of on-line deep reinforcement learning (DRL), a survey of exploratory policy generation methods in online DRL algorithms for a single agent is presented for the exploration-exploitation dilemma in the training process based on the relationship between the exploratory policy and the task policy. Firstly, the exploratory policy generation methods in reward space and parametric space of the task policy are discussed. For the exploration in reward space, the methods of adding intrinsic rewards are discussed in classification, and research progresses are analyzed based on the advantages and disadvantages of these methods. For the exploration in parametric space, the representation methods of the individual fitness function in neuroevolution algorithms are discussed with the task performance and diversity considered simultaneously. Afterwards, the exploration methods of combining the traditional action space and the parametric space are analyzed. Subsequently, a brief introduction is given to high-level task target space and task-independent exploratory policy generation methods. Finally, the methods dealing with safety constraints for exploratory policies are discussed, and the challenges faced by the exploratory policies and the future research directions are given.

     

/

返回文章
返回