一种视/触觉融合的柔性物体抓取力估计方法
A Visual-Tactile Fusion Method for Estimating the Grasping Force on Flexible Objects
-
摘要: 针对柔性物体操纵问题, 提出了一种视/触觉融合的柔性物体抓取力估计方法——多感知局部增强Transformer(MSLET)。该方法利用模型学习每种传感器模态的低维特征, 推断待抓取物体的物理特征, 融合各模态的物理特征向量, 用于预测抓取结果, 并结合安全抓取物体的经验, 推断最佳抓取力。首先, 提出了用于提取视觉图像和触觉图像浅层特征的Feature-to-Patch模块, 它利用2种图像的浅层特征提取图像块, 进而得到它们的边缘特征, 充分学习数据的特征信息, 更好地推断物体的物理特征。其次, 提出了用于增强局部特征的Local-Enhanced模块, 对多头自注意力机制生成的图像块进行深度可分离卷积处理, 以此增强局部性特征处理, 促进空间维度上相邻标记之间的相关性, 提高抓取结果的预测准确率。最后, 对比实验表明, 本文算法在保证运行效率的同时, 将抓取准确率相较于当前最优模型提高了10.19%, 证明该算法能够有效估计柔性物体抓取力。Abstract: To address the manipulation problem of flexible objects, a visual-tactile fusion method for estimating the grasping force on flexible objects is proposed, named the MultiSense Local-Enhanced Transformer (MSLET). This approach uses a model to learn low-dimensional features from each sensor modality, infers the physical characteristics of the grasped object, and integrates these modality-specific feature vectors to predict the grasping result. By leveraging knowledge of safe grasping practices, the optimal grasping force is inferred. Firstly, the Feature-to-Patch module is developed to extract shallow features from both visual and tactile images. This module generates image patches from these shallow features, capturing their edge characteristics, thus effectively learning the feature information from data and inferring the physical properties of the grasped objects. Secondly, the Local-Enhanced module is proposed to enhance local features. Depth-wise separable convolution is applied to the image patches produced by the multi-head self-attention mechanism, to enhance the local feature processing. This increases the correlation between adjacent tokens in the spatial dimension, improving the prediction accuracy of grasping results. Finally, comparative experiments demonstrate that the proposed algorithm improves the grasping accuracy by 10.19% over the state-of-the-art models while ensuring operational efficiency, thereby proving its effectiveness in estimating the grasping force on flexible objects.