强化学习
人工智能
计算机科学
动作(物理)
领域(数学)
人工神经网络
功能(生物学)
状态空间
机器人学
控制(管理)
机器学习
机器人
数学
物理
量子力学
进化生物学
纯数学
生物
统计
标识
DOI:10.1109/caibda53561.2021.00025
摘要
This study reviews the major developments of Deep Deterministic Policy Gradient (DDPG) in the field of reinforcement learning. It is innovated by Deep Q-network ideas and can finally handle some much challenging problems that operate over continuous action space. The main idea of DDPG is to use an actor-critic architecture (shown in Figure 5) to learn much more competitive policies. It allows the model to use neural network function approximators to learn in large state and action spaces. Due to its strong capacity, DDPG has many useful applications to real world problems in the field like robotics and control systems. But like most of the model-free reinforcement learning methods, the requirement for a large number of training steps is still a major difficulty for DDPG.
科研通智能强力驱动
Strongly Powered by AbleSci AI