基于引导多样性的护理机器人模仿学习

Imitation Learning in Nursing Robots Based on Oriented Diversity

  • 摘要: 传统的机器人模仿学习方法普遍存在模仿成功率低、对专家样本量依赖严重等问题,无法用于强时变、非结构化的护理场景。为解决以上问题,本文提出了一种基于引导多样性的护理机器人模仿学习方法TGOD-SD。首先构造了基于引导多样性的轨迹生成范式TGOD,该范式可以部署于基于强化学习的模仿学习算法,且无需构造奖励函数,TGOD引导智能体围绕专家示范轨迹生成多样性的模仿轨迹。其次,提出了一种基于下沉角距离(SD)的轨迹匹配方法,有助于智能体搜索最佳的匹配轨迹作为模仿学习的输出。最后,基于关节转角构造了一种仿真-现实轨迹迁移方法,将模仿学习得到的轨迹应用于实际的护理机器人上。护理机器人上的大量模仿学习实验表明,所提出的TGOD-SD方法有效地提高了机器人模仿学习的成功率,对比现有主流方法平均提高了64.6%,成功模仿的轨迹质量更好,与专家示范轨迹的相关系数平均提升了32.61%,成功学习的期望耗时至少缩短至主流方法的62.5%。更重要的是,TGOD-SD实现了从单一的专家示范样本中进行学习,降低了算法对专家示范样本数量的依赖性,有效地提高了机器人模仿学习方法的实用性。

     

    Abstract: Traditional robot imitation learning methods generally suffer from poor imitation success rates and severe reliance on quantity of expert samples, which is unsuitable for the highly time-varying and unstructured nursing scenarios.To solve the above problems, TGOD-SD, a imitation learning method based on oriented diversity, is proposed for nursing robots. Firstly, a TGOD(trajectory generation with oriented diversity) paradigm is constructed, which can be implemented in reinforcement learning based imitation learning approaches. TGOD can guide the agent to generate diverse imitation trajectories around the trajectory from expert demonstrations without constructing reward functions. Next, a trajectory matching method based on Sinkhorn distance(SD) is proposed, which benefits the agent to search for the best matching trajectory as the output of imitation learning. Finally, a sim-to-real transfer method is constructed based on joint angle to implement the imitated trajectory on the real nursing robot. A large number of imitation learning experiments on the nursing robot show that the proposed TGOD-SD method effectively improves the success rate of robot imitation learning, achieving an average improvement of 64.6% compared to the state-of-the-art(SOTA) methods; and the quality of successfully imitated trajectories is also promoted, with an average increase of 32.61% in the correlation coefficient with expert demonstration trajectories;additionally the expected time of successful imitation is reduced to 62.5% at least compared with SOTA methods. Principally,TGOD-SD accomplishes robot imitation learning from a single expert demonstration sample, which reduces the dependence on quantity of expert demonstration samples, and effectively improves the practicality of robot imitation learning.

     

/

返回文章
返回