MA Lu, LIU Chengju, LIN Limin, XU Binchen, CHEN Qijun. AM-RPPO Based Control Method for Biped Adaptive Locomotion[J]. ROBOT, 2019, 41(6): 731-741. DOI: 10.13973/j.cnki.robot.180785
Citation: MA Lu, LIU Chengju, LIN Limin, XU Binchen, CHEN Qijun. AM-RPPO Based Control Method for Biped Adaptive Locomotion[J]. ROBOT, 2019, 41(6): 731-741. DOI: 10.13973/j.cnki.robot.180785

AM-RPPO Based Control Method for Biped Adaptive Locomotion

  • An AM-RPPO (attention mechanism-recurrent proximal policy optimization) based deep reinforcement learning (DRL) is proposed and applied to the adaptive locomotion control of biped robots. Firstly, the walking control problem in joint space for biped robots in unknown environment is modeled according to partially observable Markov decision process (POMDP). And the bias of estimation for the real state by DRL algorithm and proximal policy optimization (PPO) is illustrated. Next, the architecture of recurrent neural network (RNN) is introduced, and the forward propagation process of observation states in timing sequence environment by RNN is analyzed, which is different from multi-layer perceptrons. The RNN is embedded in the action generation network and the value function generation network respectively, and its advantages relative to the traditional neural networks are demonstrated. Thirdly, the attention mechanism (AM) widely used in many fields of deep learning, is introduced to obtain the states at different time steps and establish a weighted differentiation model of the final value function. Finally, the effectiveness of the proposed AM-RPPO algorithm for the locomotion control of biped robots with high-dimensional states is verified through simulation experiments.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return