双丰, 卢万玉, 李少东, 袁小钢. 基于强化学习的机器人轴孔装配算法[J]. 机器人, 2023, 45(3): 321-332. DOI: 10.13973/j.cnki.robot.220011
引用本文: 双丰, 卢万玉, 李少东, 袁小钢. 基于强化学习的机器人轴孔装配算法[J]. 机器人, 2023, 45(3): 321-332. DOI: 10.13973/j.cnki.robot.220011
SHUANG Feng, LU Wanyu, LI Shaodong, YUAN Xiaogang. Robotic Peg-in-hole Assembly Algorithm Based on Reinforcement Learning[J]. ROBOT, 2023, 45(3): 321-332. DOI: 10.13973/j.cnki.robot.220011
Citation: SHUANG Feng, LU Wanyu, LI Shaodong, YUAN Xiaogang. Robotic Peg-in-hole Assembly Algorithm Based on Reinforcement Learning[J]. ROBOT, 2023, 45(3): 321-332. DOI: 10.13973/j.cnki.robot.220011

基于强化学习的机器人轴孔装配算法

Robotic Peg-in-hole Assembly Algorithm Based on Reinforcement Learning

  • 摘要: 为了完成非结构化环境中的机器人轴孔装配任务, 提出了一种融入模糊奖励机制的深度确定性策略梯度(DDPG)变参数导纳控制算法, 来提升未知环境下的装配效率。建立了轴孔装配接触状态力学模型, 并开展轴孔装配机理研究, 进而指导机器人装配策略的制定。基于导纳控制器实现柔顺轴孔装配, 采用DDPG算法在线辨识控制器的最优参数, 并在奖励函数中引入模糊规则, 避免陷入局部最优装配策略, 提高装配操作质量。在5种不同直径的孔上进行装配实验, 并与定参数导纳模型装配效果进行比较。实验结果表明, 本文算法明显优于固定参数模型, 并在算法收敛后10步内可完成装配操作, 有望满足非结构环境自主操作需求。

     

    Abstract: In order to complete the robotic peg-in-hole assembly task in unstructured environment, a DDPG (deep deterministic policy gradient) based variable parameter admittance control algorithm integrated with fuzzy reward mechanism is proposed to improve the assembly efficiency in unknown environment. The mechanical model of contact state for peg-in-hole assembly is established, and the peg-in-hole assembly mechanism is studied, to guide the formulation of robotic assembly strategy. The compliant peg-in-hole assembly is realized based on the admittance controller, whose optimal parameters are online identified by DDPG algorithm. The fuzzy rules are introduced into the reward function to avoid falling into the local optimal assembly strategy, which improves the assembly quality. Finally, assembly experiments are carried out on holes of 5 different diameters, and compared with the results of the fixed parameter admittance model. The experimental results show that the proposed algorithm is obviously superior to the fixed parameter model, and the assembly operation can be completed within 10 steps after the algorithm convergence. The proposed algorithm is expected to meet the requirements of autonomous manipulation in unstructured environment.

     

/

返回文章
返回