黄忠, 任福继, 胡敏, 刘娟. 基于Transformer架构和B样条平滑约束的机器人面部情感迁移网络[J]. 机器人, 2023, 45(4): 395-408. DOI: 10.13973/j.cnki.robot.220351
引用本文: 黄忠, 任福继, 胡敏, 刘娟. 基于Transformer架构和B样条平滑约束的机器人面部情感迁移网络[J]. 机器人, 2023, 45(4): 395-408. DOI: 10.13973/j.cnki.robot.220351
HUANG Zhong, REN Fuji, HU Min, LIU Juan. Robotic Facial Emotion Transfer Network Based on Transformer Framework and B-spline Smoothing Constraint[J]. ROBOT, 2023, 45(4): 395-408. DOI: 10.13973/j.cnki.robot.220351
Citation: HUANG Zhong, REN Fuji, HU Min, LIU Juan. Robotic Facial Emotion Transfer Network Based on Transformer Framework and B-spline Smoothing Constraint[J]. ROBOT, 2023, 45(4): 395-408. DOI: 10.13973/j.cnki.robot.220351

基于Transformer架构和B样条平滑约束的机器人面部情感迁移网络

Robotic Facial Emotion Transfer Network Based on Transformer Framework and B-spline Smoothing Constraint

  • 摘要: 为提高类人机器人面部情感迁移的时空一致性并降低机械运动约束的影响,提出一种基于Transformer架构和B样条平滑约束的机器人面部情感迁移网络RFEFormer。该网络由面部形变编码子网和驱动序列生成子网组成。在面部形变编码子网中,为表征帧内不同层次、不同粒度的空间信息,基于域内形变注意力和域间协作注意力双重机制构建帧内空间注意力模块并嵌入到Transformer编码器中;在驱动序列生成子网中,利用Transformer解码器实现面部时空序列和历史电机驱动序列的交叉注意以及未来电机驱动序列的多步预测,并引入三次B样条平滑约束实现预测序列的规整。实验结果表明:RFEFormer网络的电机驱动偏差、面部形变逼真度和电机运动平滑度分别为3.21%、89.48%和90.63%,且实时面部情感迁移帧率大于25帧/秒。与相关方法相比,RFEFormer网络在满足实时性的同时提升了逼真度、平滑度等时序指标性能,而人类感官对这些指标更为敏感、也更为关注。

     

    Abstract: To improve the spatial-temporal consistency of facial emotion transfer and reduce the influence of mechanical motion constraints for humanoid robot, a robotic facial emotion transformer (RFEFormer) network based on Transformer framework and B-spline smoothing constraint is proposed. The RFEFormer network consists of facial deformation encode subnet and actuation sequence generation subnet. In facial deformation encode subnet, an intra-frame spatial attention module, which is constructed based on dual mechanisms of intra-domain deformation attention and inter-domain cooperative attention, is embedded into Transformer encoder to represent the intra-frame spatial information of different levels and granularities. In actuation sequence generation subnet, a Transformer decoder, which accomplishes cross attention of facial spatio-temporal sequence and history motor actuation sequence, is addressed for multi-step prediction of future motor drive sequence. Moreover, a cubic B-spline smoothing constraint is introduced to realize the warping of prediction sequence. The experimental results show that the motor actuation deviation, the facial deformation fidelity and motor motion smoothness of the RFEFormer network is 3.21%, 89.48% and 90.63%, respectively. Furthermore, the frame rate of the real-time facial emotion transfer is greater than 25 frames per second. Compared with the related methods, the proposed RFEFormer network not only satisfies the real-time performance, but also improves the time sequence-based indexes such as fidelity and smoothness, which are more sensitive and concerned by human senses.

     

/

返回文章
返回