基于视频帧连贯信息的3维人体姿势优化估计方法

谭嘉崴; 丁其川; 白忠玉

doi:10.13973/j.cnki.robot.200023

基于视频帧连贯信息的3维人体姿势优化估计方法

Optimal Estimation Method of 3-Dimensional Human Pose Based on Video Frame Coherent Information

摘要

摘要: 针对基于视频的3维人体姿态估计问题，传统方法是先估计出每帧图像中的3维人体姿态，再将估计结果按帧序排列，获得视频中的3维人体姿态．这种方法没有考虑连续帧间人体动作的连贯性，以及人体关节连接的空间一致性，估计结果中常会出现人体的高频抖动及动作的较大偏差．针对该问题，提出一种基于视频帧连贯信息的3维姿态优化估计方法．首先利用2维姿势估计结果优化人体3维关节点坐标，以减少抖动；其次引入前后帧关节点运动的逆向与正向预测，以保持动作连贯性；最后，加入骨骼连接约束，建立可保持人体动作轨迹光滑且优化前后关节连接结构一致的模型，实现对3维人体姿态的精确估计．在公共数据集MPI-INF-3DHP上的测试结果显示，与基准3维姿态估计方法相比，本文方法估计的关节点平均误差降低3．2%．在公共数据集3DPW上的测试结果显示，与未优化情形相比，加速误差降低44%．

Abstract: For the video-based 3D human pose estimation problem, the traditional method estimates the 3D human pose in each image frame firstly, and then arranges the estimation results according to frame order to obtain the 3D human pose in the video. However, this method doesn't consider the continuity of human motion between consecutive frames, and the spatial consistency of human joint connections, leading to high-frequency jitter and large bias in estimation results. To solve this problem, an optimal estimation method of 3D pose based on coherent information of video frames is presented. Firstly, the 2D pose estimations are utilized to optimize the 3D joint coordinates of the human body, in order to reduce jitter. Secondly, the backward and forward predictions of joint point motion in previous and following frames are introduced to maintain the consistency of movement. Finally, the bone connection constraints are added to establish a model that can maintain the smoothness of the human motion trajectory and optimize the consistency of the joint connection structure before and after the optimization, so as to realize the accurate estimation of the 3D human body pose. The test results on the public data set MPI-INF-3DHP show that compared with the reference 3D pose estimation method, the average error of the joint points estimated by the proposed method is reduced by 3.2%. Test results on the public data set 3DPW show that the acceleration error is reduced by 44% compared with the unoptimized case.

HTML全文

参考文献(35)

施引文献

资源附件(0)