Human Action Recognition Based on Large Margin Nearest Neighbor
LIU Xiaoli1, YIN Jianqin2, WEI Jun1, WANG Lei3, WU Yanchun1
1. Shandong Provincial Key Laboratory of Network Based Intelligent Computing, School of Information Science and Engineering, University of Jinan, Ji'nan 250022, China;
2. School of Automation, Beijing University of Posts and Telecommunications, Beijing 100876, China;
3. Laboratory for Human Machine Control, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Abstract:In order to recognize actions in daily life for improving the service quality of the home service robot and providing a safe and comfortable environment for human, a metric learning method based on Manhattan distance is proposed for human action recognition. Firstly, Kinect is used to acquire the joint point data of human action. Then, the action sensitive feature set is constructed based on the joint point data, that is, the structure vectors of human and their corresponding angles are constructed based on the human joint point data, and the length of each sample is normalized. The large margin nearest neighbor (LMNN) method is adopted to obtain the transformation matrix L by learning Manhattan distance. And the normalized data is mapped to a better features space. Finally, the k-nearest neighbor algorithm is utilized to recognize the human actions. Based on our dataset, the accuracy of 97% is achieved. Experimental results show that the LMNN algorithm can improve the distribution of the data (that is, the intra class distance is reduced, and the inter class distance is expanded), and can complete the human action recognition task.
[1] 田国会,尹建芹,韩旭,等.一种基于关节点信息的人体行为识别新方法 [J].机器人,2014,36(3):285-292.Tian G H, Yin J Q, Han X, et al. A novel human activity recognition method using joint points information[J]. Robot, 2014, 36(3): 285-292.
[2] Wang H, Kläser A, Schmid C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013, 103(1): 60-79.
[3] Wang H, Schmid C. Action recognition with improved trajectories[C]//IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2013: 3551-3558.
[4] Zanfir M, Leordeanu M, Sminchisescu C. The moving pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection[C]//IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2013: 2752-2759.
[5] Song Y, Morency L P, Davis R W. Action recognition by hierarchical sequence summarization[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2013: 3562-3569.
[6] Ofli F, Chaudhry R, Kurillo G, et al. Sequence of the most informative joints (SMIJ): A new representation for human skeletal action recognition[J]. Journal of Visual Communication and Image Representation, 2014, 25(1): 24-38.
[7] Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3D skeletons as points in a Lie group[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2014: 588-595.
[8] Wang J, Liu Z C, Wu Y, et al. Mining actionlet ensemble for action recognition with depth cameras[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2012: 1290-1297.
[9] Everts I, van Gemert J C, Gevers T. Evaluation of color STIPs for human action recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2013: 2850-2857.
[10] Everts I, van Gemert J C, Gevers T. Evaluation of color spatio-temporal interest points for human action recognition[J]. IEEE Transactions on Image Processing, 2014, 23(4): 1569-1580.
[11] Kliper-Gross O, Gurovich Y, Hassner T, et al. Motion interchange patterns for action recognition in unconstrained videos [C]//12th European Conference on Computer Vision. Berlin, Germany: Springer, 2012: 256-269.
[12] de Campos T, Barnard M, Mikolajczyk K, et al. An evaluation of bags-of-words and spatio-temporal shapes for action recognition[C]//IEEE Workshop on Applications of Computer Vision. Piscataway, USA: IEEE, 2011: 344-351.
[13] Liu L, Shao L, Zhen X T, et al. Learning discriminative key poses for action recognition[J]. IEEE Transactions on Cybernetics, 2013, 43(6): 1860-1870.
[14] Xia L, Chen C C, Aggarwal J K. View invariant human action recognition using histograms of 3D joints[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Piscataway, USA: IEEE, 2012: 20-27.
[15] Pazhoumand-Dar H, Lam C P, Masek M. Joint movement similarities for robust 3D action recognition using skeletal data[J]. Journal of Visual Communication and Image Representation, 2015, 30: 10-21.
[16] Gao Z, Song J M, Zhang H, et al. Human action recognition via multi-modality information[J]. Journal of Electrical Engineering and Technology, 2014, 9(2): 739-748.
[17] Liu L, Shao L. Learning discriminative representations from RGB-D video data[C]//23rd International Joint Conference on Artificial Intelligence. Burlington, USA: Morgan Kaufmann, 2013: 1493-1500.
[18] Cai Z W, Wang L M, Peng X J, et al. Multi-view super vector for action recognition[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2014: 596-603.
[19] Ji S W, Xu W, Yang M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231.
[20] Hoai M, Zisserman A. Improving human action recognition using score distribution and ranking[C]//12th Asian Conference on Computer Vision. Berlin, Germany: Springer, 2015: 3-20.
[21] Ijjina E P, Krishna Mohan C. Hybrid deep neural network model for human action recognition[J]. Applied Soft Computing, 2016, 46: 936-952..
[22] Bellet A, Habrard A, Sebban M. A survey on metric learning for feature vectors and structured data[EB/OL]. (2013-07-03) [2017-05-20]. https://arxiv.org/pdf/1306.6709v2.pdf.
[23] Weinberger K Q, Saul L K. Distance metric learning for large margin nearest neighbor classification[J]. Journal of Machine Learning Research, 2009, 10: 207-244.
[24] van der Maaten L, Hinton G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605.
[25] Lu W L, Little J J. Simultaneous tracking and action recognition using the PCA-HOG descriptor[C]//3rd Canadian Conference on Computer and Robot Vision. Piscataway, USA: IEEE, 2006: 6-6.
[26] Ali S, Shah M. Human action recognition in videos using kinematic features and multiple instance learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(2): 288-303.
[27] Yang X D, Tian Y L. Eigenjoints-based action recognition using naive-Bayes-nearest-neighbor[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Piscataway, USA: IEEE, 2012: 14-19.
[28] Kliper-Gross O, Hassner T, Wolf L. One shot similarity metric learning for action recognition[C]//International Workshop on Similarity-Based Pattern Recognition. Berlin, Germany: Springer, 2011: 31-45.
[29] Tran D, Sorokin A. Human activity recognition with metric learning[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2008: 548-561.
[30] 尹建芹,刘小丽,田国会,等.基于关键点序列的人体动作识别 [J].机器人,2016,38(2):200-207,216.Yin J Q, Liu X L, Tian G H, et al. Human action recognition based on the sequence of key points[J]. Robot, 2016, 38(2): 200-207,216.
[31] 刘志强,尹建芹,张玲,等.基于 Kinect 数据主成分分析的人体动作识别 [J].光学精密工程,2015,23(10z):702-711.Liu Z Q, Yin J Q, Zhang L, et al. Human action recognition based on Kinect data principal component analysis[J]. Optics and Precision Engineering, 2015, 23(10z): 702-711.