基于多模态扩散策略的长时任务规划

Long-horizon Task Planning Based on Multi-modal Diffusion Policy

  • 摘要: 机器人完成长时任务时,离线技能学习的动作序列多种多样,自然语言指令理解与长时任务语义关系复杂,信息密度高。针对这些挑战,提出一种基于多模态扩散策略的长时任务规划方法(MMDPP)以提升复杂环境下的任务完成率与鲁棒性。该方法利用大型视觉-语言模型将自然语言任务转化为结构化任务元素,引入多模态融合模块,对低维状态、图像观测信息与任务语义进行统一建模,使用选择性通道,降低梯度冲突,减少梯度交叉干扰。在此基础上构建条件扩散生成模型,直接输出结构一致、任务对齐的动作序列,实现从语言输入到动作预测的端到端策略规划。在MuJoCo-Kitchen-Image厨房环境(自建数据集)中,MMDPP方法完成长时任务的成功率相比基线方法显著提高;在Robosuite-Kitchen环境中,成功率比SiMPL方法提高了2.4%;在UR5真实机器人平台整理场景的操作任务中成功率为80%,展现出良好的准确率与现实适应性。本文方法显著增强了长时任务动作策略学习对任务变化的适应性,为基于扩散模型的机器人长时规划提供了有效范式。

     

    Abstract: In robotic operations for long-horizon tasks, the sequences of offline skill-learning actions are diverse, the relationships between natural language instruction comprehension and long-horizon task semantics are complex, and the information density is high. To address these challenges, a long-horizon task planning algorithm based on multi-modal diffusion policy (named MMDPP) is proposed to improve the task completion rate and robustness in complex environments. The method uses a large visual language model to transform natural language tasks into structured task elements, introduces a multimodal fusion module to model the low-dimensional state, image observation and task semantics in a unified way, and uses selective channels to reduce the gradient conflict and the gradient cross-interference. A conditional diffusion generation model is constructed on this basis to directly output structurally consistent and task-aligned action sequences, realizing endto-end strategy planning from language input to action prediction. In the MuJoCo-Kitchen-Image kitchen environment (selfconstructed dataset), the MMDPP method significantly outperforms the baseline method in long-horizon task success rate; in the Robosuite-Kitchen environment, it surpasses SiMPL by 2.4%; and it achieves an 80% success rate on the UR5 physical robot platform in table-top rearrangement tasks, demonstrating good accuracy and realistic adaptability in the manipulation tasks. The adaptability of action policy learning to task changes in long-horizon tasks is significantly enhanced by the proposed method, providing an effective paradigm for long-horizon robot planning based on diffusion modeling.

     

/

返回文章
返回