Embodied Large Model for Home Service Robot Task Planning
-
Graphical Abstract
-
Abstract
Home service robots require task planning ability to efficiently complete complex instructions from humans. Recent large language models (LLMs) can provide robots with powerful reasoning abilities, but without the perception of realistic scene information, existing LLMs usually generate unexecutable task planning. To address this challenge, an embodied task planning framework based on large models named TaPA is proposed to effectively align scene information with LLMs to achieve executable task planning. Specifically, a multimodal dataset is tuned by synthesizing instruction triples consisting of scene information, human instructions, and action planning, to motivate the existing pre-trained LLMs for embodied robotic task planning ability. A highly generalized visual perception model is further employed to provide scene object information for the LLM. Extensive experimental results validate that the proposed TaPA framework outperforms the existing GPT-3.5 model by 6.38% in success rate of task planning, effectively facilitating the process of home service robot deployment.
-
-