融合双层语义信息的3维人体姿态估计网络

Fusion of Dual-layer Semantic Information for 3D Human Pose Estimation Network

  • 摘要: 在融合人体姿态的多个可行解时现有方法没有充分学习假设间的依赖关系,易导致融合结果精度不高。为此,提出了一种融合双层语义信息的3维人体姿态估计网络。首先,提出一种层次特征提取模块,对人体关节点的内在结构信息进行建模,提取包含不同层次语义信息的假设特征,同时提高关节点位置信息的利用率;其次,为进一步提高网络性能,设计特征细化模块对假设特征进行自我信息传递,以增强关节点间位置的关联性;最后,提出一种层次特征融合模块及关联计算子模块,学习多假设特征间的依赖关系,并根据该关系在假设特征间进行跨假设信息传递,使其融合为一个精准的假设特征,以此充分利用不同假设的不同层次语义信息来获得最终的3维人体姿态估计结果。分别在Human3.6M、MPI-INF-3DHP以及HumanEva-I数据集上验证所提模型的表现,实验结果表明,本方法能提高3维人体姿态估计的准确性,且可有效应对人物自遮挡及姿势复杂等情况。

     

    Abstract: When fusing multiple feasible solutions of human pose, existing methods do not adequately learn the dependencies between hypotheses, which easily leads to poor accuracy of the fusion results. Therefore, a 3D human pose estimation network fusing dual-layer semantic information is proposed. A hierarchical feature extraction module is proposed firstly to model the intrinsic structural information of human joint points, extract hypothetical features containing different levels of semantic information, and improve the utilization rate of position information of joint points. In order to further improve the network performance, a feature refinement module is designed secondly to transfer self-information of the hypothetical features, thus enhancing the correlation between joint positions. Finally, a hierarchical feature fusion module and an association calculation sub-module are proposed to learn the dependency relationship between multi-hypothesis features, and according to the relationship, cross-hypothesis information transfer is carried out between hypothesis features to fuse them into accurate and unified hypothesis features. Therefore, the different levels of semantic information of different hypotheses are fully utilized to obtain the final 3D human pose estimation results. The performance of the proposed model is verified on the Human3.6M, MPI-INF-3 DHP and HumanEva-I datasets respectively, and the experimental results show that the proposed method can improve the accuracy of 3D human pose estimation, and effectively deal with the cases of human self-occlusion and complex poses.

     

/

返回文章
返回