地形
机器人
计算机科学
控制理论(社会学)
控制(管理)
控制工程
模拟
算法
工程类
人工智能
地理
地图学
作者
G. Zhang,Naijian Chen,Yiming Ji
标识
DOI:10.1177/01423312251371751
摘要
To address the challenges of poor body stability and limited generalization in quadruped locomotion over unstructured terrains, this paper proposes a locomotion control method based on an improved Long Short-Term Memory–Proximal Policy Optimization (LSTM-PPO) algorithm. An LSTM-based state processing module is integrated into the policy and value networks to handle the variation in input state length caused by complex terrain transitions. The PPO architecture is accordingly modified to support sequential state encoding. A distributed reinforcement learning framework is constructed, incorporating a multi-objective reward and penalty mechanism to enhance the adaptability of the learned policy across diverse environments. The training is conducted in a simulation environment built on Isaac Gym, where ablation studies are performed to validate the effectiveness of the LSTM module and the rationality of its hidden layer configuration. Comparative experiments against standard PPO, TD3, and DDPG demonstrate that the proposed algorithm achieves faster convergence, higher cumulative rewards, and more stable training. Finally, the learned policy is deployed and validated in both the Gazebo simulation platform and a real quadruped robot. Experimental results show that the proposed method enables the robot to effectively adapt to complex terrains such as grass, slopes, and stairs, exhibiting strong robustness and practical applicability.
科研通智能强力驱动
Strongly Powered by AbleSci AI