Action Recognition and Prediction of Human Skeleton Based on BSCPs-RF
YIN Yanfang1,2, SUN Nongliang2, LIU Ming1, REN Guoqiang1
1. Department of Electrical Engineering & Information Technology, Shandong University of Science and Technology, Jinan 250031, China;
2. College of Electronic Communications and Physics, Shandong University of Science and Technology, Qingdao 266590, China
尹燕芳, 孙农亮, 刘明, 任国强. 基于BSCPs-RF的人体关节点行为识别与预测[J]. 机器人, 2017, 39(6): 795-802.DOI: 10.13973/j.cnki.robot.2017.0795.
YIN Yanfang, SUN Nongliang, LIU Ming, REN Guoqiang. Action Recognition and Prediction of Human Skeleton Based on BSCPs-RF. ROBOT, 2017, 39(6): 795-802. DOI: 10.13973/j.cnki.robot.2017.0795.
摘要针对人体关节点序列的连续行为识别问题,提出了一种基于BSCPs-RF(B-spline control points-random forest)的人体关节点信息行为识别与预测方法.首先采用局部线性回归与单帧关节点归一化法预处理关节点序列,以此消除抖动噪声、位移和尺度的影响;然后以B样条曲线控制点作为速度无关的关节点序列特征,并采用同步语音提示词法标注实时行为序列以提高样本采集效率;最后采用基于随机森林的行为识别与预测方法,并以集成学习方法优化多分类器组合以提高识别精度.实验分析了不同参数值对识别效果的影响,并分别在测试数据库MSR-Action3D以及RGB-D设备采集的实时数据集中进行测试.结果显示,MSR-Action3D测试结果优于部分先前方法,而实时数据测试中该方法具有很高的识别精度,进而验证了该方法的有效性.
Abstract:For the continuous action recognition of human skeleton sequence, an action recognition and prediction method based on B-spline control points-random forest (BSCPs-RF) is proposed. Firstly, the local linear regression and the single frame skeleton normalization method are used to preprocess skeleton sequence to eliminate the impacts from jitter noise, displacement and scale. Then the B-spline curve control points are used as the speed-independent feature of skeleton sequences, and the real-time behaviour sequences are labelled by adopting the synchronous voice cue words to improve the efficiency of sample collection. Finally, the method of action recognition and prediction based on random forest is employed as classifier, and an ensemble learning technology is used to optimize the multiple classifiers combination to boost the recognition accuracy. The influence of different parameter values on the recognition is analyzed. The method is tested on the MSR-Action3D test database and the real-time skeletal database collected by the RGB-D device respectively. The results show that the proposed method obtains better results than some of the existing methods on MSR-Action3D database and implements high accuracy recognition in real-time data test, which verifies the effectiveness of the proposed method.
[1] Itauma I I, Kivrak H, Kose H. Gesture imitation using machine learning techniques[C]//Signal Processing and Communica-tions Applications Conference. Piscataway, USA:IEEE, 2012:1-4.
[2] Poppe R. A survey on vision-based human action recognition[J]. Image and Vision Computing, 2010, 28(6):976-990.
[3] Yuan Y, Qi L, Lu X. Action recognition by joint learning[J]. Image and Vision Computing, 2016, 55:77-85.
[4] Liu A, Su Y, Nie W, et al. Hierarchical clustering multi-task learning for joint human action grouping and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(1):102-114.
[5] 杜勇. 基于深度表示学习的行为识别研究[D]. 北京:中国科学院大学,2016.Du Y. Study on action recognition based on depth representation learning[D]. Beijing:University of Chinese Academy of Sciences, 2016.
[6] Wang L, Qiao Y, Tang X. action recognition with trajectory-pooled deep-convolutional descriptors[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2015:4305-4314.
[7] Wang X, Gao L, Song J, et al. Beyond frame-level CNN:Saliency-aware 3D CNN with LSTM for video action recognition[J]. IEEE Signal Processing Letters, 2017, 24(4):510-514.
[8] Zhang J, Li W, Ogunbona P O, et al. RGB-D-based action recognition datasets:A survey[J]. Pattern Recognition, 2016, 60:86-105.
[9] Shotton J, Fitzgibbon A, Cook M, et al. Real-time human pose recognition in parts from single depth images[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2011:1297-1304.
[10] Wong F, Dai D. A fuzzy approach to real-time digital color reproduction of clothing with 3D camera[C]//IEEE International Conference on Image Processing. Piscataway, USA:IEEE, 2015:3936-3940.
[11] Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3D skeletons as points in a Lie group[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2014:588-595.
[12] Du Y, Fu Y, Wang L. Representation learning of temporal dynamics for skeleton based action recognition[J]. IEEE Transactions on Image Processing, 2016, 25(7):3010-3022.
[13] Liua Z, Zhangb C, Tian Y. 3D-based deep convolutional neural network for action recognition with depth sequences[J]. Image and Vision Computing, 2016, 55:93-100.
[14] Du Y, Wei W, Liang W. Hierarchical recurrent neural network for skeleton based action recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2015:1110-1118.
[15] Amor B B, Su J, Srivastava A. Action recognition using rate-invariant analysis of skeletal shape trajectories[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(1):1-13.
[16] Ramanathan M, Yau W, Teoh E K. Human action recognition with video data research and evaluation challenges[J]. IEEE Transactions on Human-Machine Systems, 2014, 44(5):650-663.
[17] Zanfir M, Leordeanu M, Sminchisescu C. The moving pose:An efficient 3D kinematics descriptor for low-latency action recognition and detection[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2013:2752-2759.
[18] Yang J Y, Ma L, Bai D C, et al. Robot imitation learning based on cyber-graphic model[C]//World Congress on Intelligent Control and Automation. Piscataway, USA:IEEE, 2015:1433-1438.
[19] 马乐. 基于非接触观测信息的机器人行为模仿学习[D]. 沈阳:沈阳工业大学,2014.Ma L. Robot imitation learning based on non-contact observation[D]. Shenyang:Shenyang University of Technology, 2014.
[20] Wang J, Liu Z, Wu Y, et al. Mining actionlet ensemble for action recognition with depth cameras[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2012:1290-1297.
[21] Li W, Zhang Z, Liu Z. Action recognition based on a bag of 3D points[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2010:9-14.
[22] Ellis C, Masood S Z, Tappen M F, et al. Exploring the trade-off between accuracy and observational latency in action recognition[J]. International Journal of Computer Vision, 2013, 101(3):420-436.
[23] Morency L, Quattoni A, Darrell T. Latent-dynamic discriminative models for continuous gesture recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2007:1-8.
[24] Lv F, Nevatia R. Recognition and segmentation of 3-D human action using HMM and multi-class AdaBoost[C]//9th European Conference on Computer Vision. Berlin, Germany:Springer-Verlag, 2006:359-372.