稀疏奖励场景下基于适应性状态近似的多智能体强化学习

Multi-agent Reinforcement Learning Based on Adaptive State Approximation in Sparse Reward Scenarios

  • 摘要: 稀疏奖励是多智能体强化学习的主要挑战之一,现有算法难以在稀疏奖励场景下有效训练智能体团队,并且容易导致其探索效率低下。为解决此类问题,本文提出基于适应性状态近似的多智能体强化学习算法。受人类在奖励稀缺情况下学习的启发,通过考虑智能体状态之间的相似性,自适应地从经验池中获取近似状态,并将其添加到候选状态集,利用候选状态集中的探索信息促进策略训练。此外,算法还将该近似状态与当前局部状态之间的距离作为智能体的内在奖励,引导智能体在最大化联合状态动作值的同时更有效地探索未知环境,快速找到最优策略。实验结果表明,本文算法在不同稀疏奖励程度的多智能体追捕场景中表现优于现有强化学习方法,具备鲁棒性和有效性,能够加快智能体的学习速度。

     

    Abstract: Sparse reward is one of the main challenges in multi-agent reinforcement learning, and existing algorithms are difficult to effectively train agent teams in sparse reward scenarios, resulting in low exploration efficiency. Inspired by human learning in reward scarcity situations, a multi-agent reinforcement learning algorithm based on adaptive state approximation (MAASA) is proposed to solve this problem. By considering the similarity among the agent states, the algorithm automatically obtains approximate states from the replay buffer to fill the candidate state set, and uses the exploration information in the candidate state set to promote the training of strategies. In addition, MAASA algorithm also uses the distance between the approximate state and the current local state as an intrinsic reward for the agent, to guide the agent to explore the unknown environment more effectively while maximizing the joint state-action value, and quickly find the optimal strategy. The experimental results show that the algorithm performs better than existing reinforcement learning methods in multi-agent predator-prey environment scenarios with different sparsities, demonstrate robustness and effectiveness, and can accelerate the learning speed of the agents.

     

/

返回文章
返回