基于多模态扩散策略的长时任务规划

罗佳媛; 刘泽阳; 兰旭光

doi:10.13973/j.cnki.robot.250192

基于多模态扩散策略的长时任务规划

Long-horizon Task Planning Based on Multi-modal Diffusion Policy

摘要

摘要: 机器人完成长时任务时，离线技能学习的动作序列多种多样，自然语言指令理解与长时任务语义关系复杂，信息密度高。针对这些挑战，提出一种基于多模态扩散策略的长时任务规划方法（MMDPP）以提升复杂环境下的任务完成率与鲁棒性。该方法利用大型视觉－语言模型将自然语言任务转化为结构化任务元素，引入多模态融合模块，对低维状态、图像观测信息与任务语义进行统一建模，使用选择性通道，降低梯度冲突，减少梯度交叉干扰。在此基础上构建条件扩散生成模型，直接输出结构一致、任务对齐的动作序列，实现从语言输入到动作预测的端到端策略规划。在MuJoCo-Kitchen-Image厨房环境（自建数据集）中，MMDPP方法完成长时任务的成功率相比基线方法显著提高；在Robosuite-Kitchen环境中，成功率比SiMPL方法提高了2.4%；在UR5真实机器人平台整理场景的操作任务中成功率为80%，展现出良好的准确率与现实适应性。本文方法显著增强了长时任务动作策略学习对任务变化的适应性，为基于扩散模型的机器人长时规划提供了有效范式。

Abstract: In robotic operations for long-horizon tasks, the sequences of offline skill-learning actions are diverse, the relationships between natural language instruction comprehension and long-horizon task semantics are complex, and the information density is high. To address these challenges, a long-horizon task planning algorithm based on multi-modal diffusion policy (named MMDPP) is proposed to improve the task completion rate and robustness in complex environments. The method uses a large visual language model to transform natural language tasks into structured task elements, introduces a multimodal fusion module to model the low-dimensional state, image observation and task semantics in a unified way, and uses selective channels to reduce the gradient conflict and the gradient cross-interference. A conditional diffusion generation model is constructed on this basis to directly output structurally consistent and task-aligned action sequences, realizing endto-end strategy planning from language input to action prediction. In the MuJoCo-Kitchen-Image kitchen environment (selfconstructed dataset), the MMDPP method significantly outperforms the baseline method in long-horizon task success rate; in the Robosuite-Kitchen environment, it surpasses SiMPL by 2.4%; and it achieves an 80% success rate on the UR5 physical robot platform in table-top rearrangement tasks, demonstrating good accuracy and realistic adaptability in the manipulation tasks. The adaptability of action policy learning to task changes in long-horizon tasks is significantly enhanced by the proposed method, providing an effective paradigm for long-horizon robot planning based on diffusion modeling.

HTML全文

参考文献(36)

施引文献

资源附件(0)