张祯毅, 黄捷. 基于行为的多差速机器人强化学习任务监管器设计[J]. 机器人, 2024, 46(4): 397-416, 424. DOI: 10.13973/j.cnki.robot.230148
引用本文: 张祯毅, 黄捷. 基于行为的多差速机器人强化学习任务监管器设计[J]. 机器人, 2024, 46(4): 397-416, 424. DOI: 10.13973/j.cnki.robot.230148
ZHANG Zhenyi, HUANG Jie. Reinforcement Learning Mission Supervisor Design for Behavior-based Differential Drive Robots[J]. ROBOT, 2024, 46(4): 397-416, 424. DOI: 10.13973/j.cnki.robot.230148
Citation: ZHANG Zhenyi, HUANG Jie. Reinforcement Learning Mission Supervisor Design for Behavior-based Differential Drive Robots[J]. ROBOT, 2024, 46(4): 397-416, 424. DOI: 10.13973/j.cnki.robot.230148

基于行为的多差速机器人强化学习任务监管器设计

Reinforcement Learning Mission Supervisor Design for Behavior-based Differential Drive Robots

  • 摘要: 针对多差速机器人系统提出了一种基于试错学习的多智能体强化学习任务监管器。此方法解决了基于行为的多智能体系统总是依赖人的智能设计切换规则以决策行为优先级的问题。首先,在零空间行为控制框架下引入了差速模型代替质点模型,首次推导了具有非完整约束的零空间行为控制范式,从而提升了系统对最小极值状态的鲁棒性。然后,首次将行为优先级切换问题建模为协作式马尔可夫博弈问题,学习了一个最优的联合策略以动态且智能地决策行为优先级,不仅避免了人工设计切换规则,而且降低了在线计算和存储负担。仿真结果显示,所提出多智能体强化学习任务监管器具有优越的行为优先级切换性能。在AgileX Limo系列多差速机器人系统上的成功应用,验证了该任务监管器的实用性。

     

    Abstract: A multi-agent reinforcement learning mission supervisor (MARLMS) is designed for differential drive robots using trial-and-error learning. The proposed MARLMS addresses the challenge inherent in behavior-based multi-agent systems, wherein the design of switching rules to determine behavior priorities relies heavily on human intelligence. Building upon the null-space-based behavioral control (NSBC) framework, a differential model is introduced to replace the particle model. Consequently, a paradigm of NSBC with nonholonomic constraints is presented for the first time, enhancing the system robustness to the minimum extremum state. Subsequently, a joint policy is developed to dynamically and intelligently determine behavior priorities by modeling the behavior priority switching problem as a cooperative Markov game. The proposed MARLMS not only eliminates the need for manual design of switching rules but also reduces the computational and storage burdens during online operations. Simulation results demonstrate the superior behavior priority switching performance of the proposed MARLMS. Furthermore, successful implementation on AgileX Limo robots validates the practicality of the proposed MARLMS.

     

/

返回文章
返回