Abstract:
A multi-agent reinforcement learning mission supervisor (MARLMS) is designed for differential drive robots using trial-and-error learning. The proposed MARLMS addresses the challenge inherent in behavior-based multi-agent systems, wherein the design of switching rules to determine behavior priorities relies heavily on human intelligence. Building upon the null-space-based behavioral control (NSBC) framework, a differential model is introduced to replace the particle model. Consequently, a paradigm of NSBC with nonholonomic constraints is presented for the first time, enhancing the system robustness to the minimum extremum state. Subsequently, a joint policy is developed to dynamically and intelligently determine behavior priorities by modeling the behavior priority switching problem as a cooperative Markov game. The proposed MARLMS not only eliminates the need for manual design of switching rules but also reduces the computational and storage burdens during online operations. Simulation results demonstrate the superior behavior priority switching performance of the proposed MARLMS. Furthermore, successful implementation on AgileX Limo robots validates the practicality of the proposed MARLMS.