面向居家服务机器人任务规划的具身大模型

Embodied Large Model for Home Service Robot Task Planning

  • 摘要: 居家服务机器人需要任务规划能力以有效完成人类下达的复杂指令。最近的大语言模型(LLM)可以为机器人提供强大的推理能力以实现任务规划,但由于缺少真实场景信息的感知,现有的LLM通常会生成不可执行的任务规划。针对这个难点,提出了一个基于大语言模型的具身任务规划框架(TaPA),有效地将场景信息与LLM对齐从而实现可执行的任务规划。具体来说,通过合成包含场景信息、人类指令和动作规划的三元组指令调优多模态数据集,用以激发现有预训练LLM的具身机器人任务规划能力。进一步使用高泛化视觉感知模型为LLM提供场景物体信息。大量实验结果验证了所提出的TaPA框架的任务规划成功率比现有GPT-3.5模型高出6.38%,有效地促进了居家服务机器人落地部署的进程。

     

    Abstract: Home service robots require task planning ability to efficiently complete complex instructions from humans. Recent large language models (LLMs) can provide robots with powerful reasoning abilities, but without the perception of realistic scene information, existing LLMs usually generate unexecutable task planning. To address this challenge, an embodied task planning framework based on large models named TaPA is proposed to effectively align scene information with LLMs to achieve executable task planning. Specifically, a multimodal dataset is tuned by synthesizing instruction triples consisting of scene information, human instructions, and action planning, to motivate the existing pre-trained LLMs for embodied robotic task planning ability. A highly generalized visual perception model is further employed to provide scene object information for the LLM. Extensive experimental results validate that the proposed TaPA framework outperforms the existing GPT-3.5 model by 6.38% in success rate of task planning, effectively facilitating the process of home service robot deployment.

     

/

返回文章
返回