Abstract：Aiming at the movement representation and generalization problems in imitation learning, a cross entropy optimization algorithm is proposed to infer parameters in mixture models. The proposed algorithm is easy to implement and computationally efficient. More importantly, it can automatically determine the optimal component number in the mixture models. In order to produce generalized motion trajectories, a cross entropy regression algorithm is proposed. To further improve the adaptability of the algorithm in dynamic environments, the concept of task parametrization is introduced and a task-parameterized cross entropy regression algorithm is proposed. Finally, a novel hammer-over-a-nail task is designed, which verifies the theoretical correctness and superiority of the proposed methods. Simulation experiments based on robot physical simulation software Gazebo show the feasibility of the proposed algorithms in piratical applications.
 Argall B D, Chernova S, Veloso M, et al. A survey of robot learning from demonstration[J]. Robotics and Autonomous Systems, 2009, 57(5):469-483.
 Billard A G, Calinon S, Dillmann R, et al. Learning from hu-mans[M]//Springer Handbook of Robotics. Secaucus, USA:Springer, 2016:1995-2014.
 Liu S, Asada H. Teaching and learning of deburring robotsusing neural networks[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 1993:339-345.
 Billard A. Learning motor skills by imitation:A biologically inspired robotic model[J]. Cybernetics and Systems, 2001, 32(1-2):155-193.
 Kaiser M, Dillmann R. Building elementary robot skills from human demonstration[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 1996:2700-2705.
 Dillmann R, Kaiser M, Ude A. Acquisition of elementary robot skills from human demonstration[C]//International Symposium on Intelligent Robotics Systems. 1995:185-192.
 Vakanski A, Mantegh I, Irish A, et al. Trajectory learning for robot programming by demonstration using hidden Markov model and dynamic time warping[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2012, 42(4):1039-1052.
 Nguyen-Tuong D, Peters J R, Seeger M. Local Gaussian process regression for real time online model learning[C]//Advances in Neural Information Processing Systems. Red Hook, USA:Curran Associates Inc., 2009:1193-1200.
 Vijayakumar S, Schaal S. Locally weighted projection regression:An O(n) algorithm for incremental real time learning in high dimensional space[C]//Proceedings of the Seventeenth International Conference on Machine Learning. 2000:1079-1086.
 Calinon S. A tutorial on task-parameterized movement learning and retrieval[J]. Intelligent Service Robotics, 2016, 9(1):1-29.
 Ijspeert A J, Nakanishi J, Schaal S. Learning attractor landscapes for learning motor primitives[C]//Advances in Neural Information Processing Systems. USA:Neural Information Processing Systems Foundation, 2003:1547-1554.
 Schaal S. Dynamic movement primitives——A framework for motor control in humans and humanoid robotics[M]//Adaptive Motion of Animals and Machines. Tokyo, Japan:Springer, 2006:261-280.
 Ijspeert A J, Nakanishi J, Hoffmann H, et al. Dynamical movement primitives:Learning attractor models for motor behaviors[J]. Neural Computation, 2013, 25(2):328-373.
 Nakanishi J, Morimoto J, Endo G, et al. Learning from demonstration and adaptation of biped locomotion[J]. Robotics and Autonomous Systems, 2004, 47(2-3):79-91.
 Park D H, Hoffmann H, Pastor P, et al. Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields[C]//IEEE-RAS International Conference on Humanoid Robots. Piscataway, USA:IEEE, 2008:91-98.
 Gams A, Nemec B, Ijspeert A J, et al. Coupling movement primitives:Interaction with the environment and bimanual tasks[J]. IEEE Transactions on Robotics, 2014, 30(4):816-830.
 Paraschos A, Daniel C, Peters J R, et al. Probabilistic movement primitives[C]//Advances in Neural Information Processing Systems. USA:Neural Information Processing Systems Foundation, 2013:2616-2624.
 Gribovskaya E, Khansari-Zadeh S M, Billard A. Learning non-linear multivariate dynamics of motion in robotic manipulators[J]. International Journal of Robotics Research, 2011, 30(1):80-117.
 Wang Z G, Zhao Z S, Weng S F, et al. Incremental multiple instance outlier detection[J]. Neural Computing and Applications, 2015, 26(4):957-968.
 Tabor J, Spurek P. Cross-entropy clustering[J]. Pattern Recognition, 2014, 47(9):3046-3059.
 马继涌,高文.基于最大交叉熵估计高斯混合模型参数的方法[J].软件学报,1999,10(9):974-978. Ma J Y, Gao W. An approach for estimating parameters in Gaussian mixture model based on maximum cross entropy[J]. Journal of Software, 1999, 10(9):974-978.
 Botev Z, Kroese D P. Global likelihood optimization via the cross-entropy method with an application to mixture models[C]//Proceedings of the 2004 Winter Simulation Conference. Piscataway, USA:IEEE, 2004:529-535.
 Englert P, Paraschos A, Deisenroth M P, et al. Probabilistic model-based imitation learning[J]. Adaptive Behavior, 2013, 21(5):388-403.
 Levine S, Koltun V. Variational policy search via trajectory optimization[C]//Advances in Neural Information Processing Systems. USA:Neural Information Processing Systems Foundation, 2013:207-215.