目标位置引导的五指灵巧手手内重定向

Target Position-guided In-hand Re-orientation for Five-fingered Dexterous Hands

  • 摘要: 重定向是指将物体旋转至期望姿态,其中,将物体从任意初始姿态旋转到任意期望姿态是最具挑战性的场景。为引导不同驱动自由度的五指拟人灵巧手以更加类人的方式高效完成手内重定向任务,提出了一种目标位置引导的手内物体重定向策略生成方法。首先,受人手在手内重定向过程中操作特性的启发,并根据五指拟人灵巧手驱动自由度的分布特性,提出了一种设计目标位置的可行原则。将重定向过程中物体实际位置与目标位置的差值作为即时奖励的一部分,以引导五指拟人灵巧手将物体保持在目标位置附近。其次,受人手在准备执行重定向任务前预备状态的启发,设计了一种在每次状态重置时采样五指拟人灵巧手关节位置的方法,以提高操控能力;最后,采用基于长短期记忆(LSTM)网络与非对称演员-评论家架构的近端策略优化(PPO)算法训练重定向策略。仿真结果表明,所提出的方法使9驱动自由度雄克SVH灵巧手、13驱动自由度BICE(北京控制工程研究所)灵巧手和18驱动自由度Shadow灵巧手在执行重定向操作时,均能逼近预设的最大连续成功次数。此外,相较于不包含目标位置引导的手内物体重定向方法,所提出的方法有效降低了完成重定向任务所需的平均步数。综上所述,所提出的方法使得具有不同驱动自由度的五指拟人灵巧手能够通过手掌和手指的协同作用,以类人的方式高效完成物体重定向任务,显著提升了操作效率。

     

    Abstract: Re-orientation involves rotating an object to a target configuration, with the most challenging case being the rotation from an arbitrary initial configuration to an arbitrary target configuration. To address the challenge of efficiently performing in-hand re-orientation tasks in a more human-like manner by guiding anthropomorphic five-fingered dexterous hands with different degrees of actuation (DoA), a target position-guided in-hand object re-orientation policy generation method is proposed. Firstly, a feasible principle for designing target positions is proposed, inspired by the operation characteristics of human hands during in-hand re-orientation and based on the distribution characteristics of DoA in anthropomorphic five-fingered dexterous hands. The difference between the actual and target positions of the object during re-orientation process is utilized as a component of the immediate reward to guide anthropomorphic five-fingered dexterous hands in maintaining the object near the target. Secondly, a method is developed inspired by the preparatory states of human hands before performing re-orientation tasks, to sample the joint positions of anthropomorphic five-fingered dexterous hands when resetting the state everytime, aiming to enhance manipulation capabilities. Finally, the re-orientation policy is trained using the proximal policy optimization (PPO) algorithm based on the long short-term memory (LSTM) network and asymmetric actor-critic architecture. Simulation results show that the proposed method enables the 9-DoA Schunk SVH dexterous hand, the 13-DoA BICE dexterous hand developed by Beijing Institute of Control Engineering (BICE), and the 18-DoA Shadow dexterous hand to approach the predefined maximum number of consecutive successes when performing re-orientation tasks. Moreover, compared with in-hand object re-orientation policy generation method without target position guidance, the proposed method significantly reduces the average number of steps required to perform re-orientation tasks. The proposed method enables anthropomorphic five-fingered dexterous hands with different DoA to efficiently perform object re-orientation tasks in a human-like manner through coordinated action of the palm and fingers, significantly enhancing operational efficiency.

     

/

返回文章
返回