Abstract:
Sparse reward is one of the main challenges in multi-agent reinforcement learning, and existing algorithms are difficult to effectively train agent teams in sparse reward scenarios, resulting in low exploration efficiency. Inspired by human learning in reward scarcity situations, a multi-agent reinforcement learning algorithm based on adaptive state approximation (MAASA) is proposed to solve this problem. By considering the similarity among the agent states, the algorithm automatically obtains approximate states from the replay buffer to fill the candidate state set, and uses the exploration information in the candidate state set to promote the training of strategies. In addition, MAASA algorithm also uses the distance between the approximate state and the current local state as an intrinsic reward for the agent, to guide the agent to explore the unknown environment more effectively while maximizing the joint state-action value, and quickly find the optimal strategy. The experimental results show that the algorithm performs better than existing reinforcement learning methods in multi-agent predator-prey environment scenarios with different sparsities, demonstrate robustness and effectiveness, and can accelerate the learning speed of the agents.