Abstract:
For the video-based 3D human pose estimation problem, the traditional method estimates the 3D human pose in each image frame firstly, and then arranges the estimation results according to frame order to obtain the 3D human pose in the video. However, this method doesn't consider the continuity of human motion between consecutive frames, and the spatial consistency of human joint connections, leading to high-frequency jitter and large bias in estimation results. To solve this problem, an optimal estimation method of 3D pose based on coherent information of video frames is presented. Firstly, the 2D pose estimations are utilized to optimize the 3D joint coordinates of the human body, in order to reduce jitter. Secondly, the backward and forward predictions of joint point motion in previous and following frames are introduced to maintain the consistency of movement. Finally, the bone connection constraints are added to establish a model that can maintain the smoothness of the human motion trajectory and optimize the consistency of the joint connection structure before and after the optimization, so as to realize the accurate estimation of the 3D human body pose. The test results on the public data set MPI-INF-3DHP show that compared with the reference 3D pose estimation method, the average error of the joint points estimated by the proposed method is reduced by 3.2%. Test results on the public data set 3DPW show that the acceleration error is reduced by 44% compared with the unoptimized case.