Abstract:A human pose estimation algorithm with a multi-level dynamic model for monocular videos is presented. Firstly, a multi-level dynamic model of human pose is constructed to decompose human entire pose into articulated pose parts, and approach optimal human pose candidates by optimizing pose parts candidates. This model solves the ambiguity problem caused by the entire pose estimation method. Secondly, an algorithm for calculating the pose consistency between the adjacent video frames is proposed by constructing virtual poses. This method can make use of the continuity of appearance features and motion features between the adjacent frames to improve the estimation accuracy. Thirdly, particle swarm optimization method is utilized to search for the best pose parts candidates with a small amount of candidates, and then the achieved pose parts are recomposed into the optimal human entire poses. The efficiency of the proposed method is tested and experimentally compared with several related state-of-the-art methods on challenging video sequences, which shows significant improvements.
[1] Dautenhahn K. Socially intelligent robots:Dimensions of human-robot interaction[J]. Philosophical Transactions of the Royal Society of London, B:Biological Sciences, 2007, 362(1480):679-704.
[2] Atkeson C G, Hale J G, Pollick F E, et al. Using humanoid robots to study human behavior[J]. IEEE Intelligent Systems and Their Applications, 2000, 15(4):46-55.
[3] 田国会,尹建芹,韩旭,等.一种基于关节点信息的人体行为识别新方法[J].机器人,2014,36(3):285-292. Tian G H, Yin J Q, Han X, et al. A novel human activity recognition method using joint points information[J]. Robot, 2014, 36(3):285-292.
[4] Yang Y Z, Li Y, Fermüller C, et al. Robot learning manipulation action plans by "watching" unconstrained videos from the World Wide Web[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015:3686-3693.
[5] Koppula H S, Gupta R, Saxena A. Learning human activities and object affordances from RGB-D videos[J]. International Journal of Robotics Research, 2013, 32(8):951-970.
[6] Yang Y, Ramanan D. Articulated human detection with flexible mixtures of parts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12):2878-2890.
[7] Dantone M, Gall J, Leistner C, et al. Human pose estimation using body parts dependent joint regressors[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2013:3041-3048.
[8] Fischler M A, Elschlager R A. The representation and matching of pictorial structures[J]. IEEE Transactions on Computers, 1973, 22(1):67-92.
[9] Freifeld O, Weiss A, Zuffi S, et al. Contour people:A parameterized model of 2D articulated human shape[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2010:639-646.
[10] Zuffi S, Freifeld O, Black M J. From pictorial structures to deformable structures[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2012:3546-3553.
[11] Park D, Ramanan D. N-best maximal decoders for part models[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2011:2627-2634.
[12] Sapp B, Toshev A, Taskar B. Cascaded models for articulated pose estimation[C]//11th European Conference on Computer Vision. Berlin, Germany:Springer, 2010:406-420.
[13] Andriluka M, Roth S, Schiele B. Pictorial structures revisited:People detection and articulated pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2009:1014-1021.
[14] Ramanan D, Forsyth D A, Zisserman A. Strike a pose:Tracking people by finding stylized poses[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2005:271-278.
[15] Lee M W, Nevatia R. Human pose tracking using multi-level structured models[C]//9th European Conference on Computer Vision. Berlin, Germany:Springer, 2006:368-381.
[16] Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition[J]. International Journal of Computer Vision, 2005, 61(1):55-79.
[17] Sigal L, Black M J. Measure locally, reason globally:Occlusion-sensitive articulated pose estimation[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2006:2041-2048.
[18] Anguelov D, Srinivasan P, Koller D, et al. SCAPE:Shape completion and animation of people[J]. ACM Transactions on Graphics, 2005, 24(3):408-416.
[19] Zuffi S, Romero J, Schmid C, et al. Estimating human pose with flowing puppets[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2013:3312-3319.
[20] Chang C C, Lin C J. LIBSVM:A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3):No.27.
[21] Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359.
[22] Sapp B,Weiss D, Taskar B. Parsing human motion with stretchable models[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2011:1281-1288.
[23] Sigal L, Black M J. Predicting 3D people from 2D pictures[C]//4th International Conference on Articulated Motion and Deformable Objects. Berlin, Germany:Springer, 2006:185-195.
[24] Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2011:1385-1392.
[25] Poli R, Kennedy J, Blackwell T. Particle swarm optimization:An overview[J]. Swarm Intelligence, 2007, 1(1):33-57.
[26] Rohrbach M, Amin S, Andriluka M, et al. A database for fine grained activity detection of cooking activities[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2012:1194-1201.
[27] 柴汇,孟健,荣学文,等.高性能液压驱动四足机器人SCalf的设计与实现[J].机器人,2014,36(4):385-391. Chai H, Meng J, Rong XW, et al. Design and implementation of SCalf, an advanced hydraulic quadruped robot[J]. Robot, 2014, 36(4):385-391.
[28] 张慧,荣学文,李贻斌,等.四足机器人地形识别与路径规划算法[J].机器人,2015,37(5):546-556. Zhang H, Rong XW, Li Y B, et al. Terrain recognition and path planning for quadruped robot[J]. Robot, 2015, 37(5):546-556.