Abstract:
To study the influence of different residual connection methods on CNN (convolutional neural network) for human motion prediction, this paper investigates how to use residual connection to construct an effective prediction model for capturing the human motion features by the network with a certain depth. Through observing the arrangement of human skeletal joints, a symmetric residual connection method is proposed for predicting the human skeletal joints, and a symmetric residual block (SRB) is designed based on the proposed method. In the designed SRB, the receptive field of the last convolution kernel is maximized, covering all the joint information of the human body. The symmetric connection method is adopted to efficiently utilize the shallow dynamic features, and consequently improve the prediction performance and reduce the model parameters. Based on two SRBs and one decoder, an end-to-end convolutional network is proposed, named as symmetric residual network (SRNet), by which a higher accuracy is achieved comparing with the baseline methods. In the framework of TensorFlow, human motion prediction experiments are carried out on two public datasets, Human3.6M and CMU-Mocap. The results indicate that, the proposed method reduces the mean per joint position error (MPJPE) by 0.2 mm~1 mm at each prediction time point comparing with the baseline methods, which confirms the effectiveness of the proposed SRNet for modeling the human global spatial features.