PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning

强化学习 计算机科学 参数统计 摄动(天文学) 动作(物理) 人口 人工智能 数学优化 机器学习 数学 物理 统计 人口学 量子力学 社会学
作者
Shilei Li,Meng Li,Jiongming Su,Shaofei Chen,Zhimin Yuan,Qing Ye
出处
期刊:ACM Transactions on Intelligent Systems and Technology [Association for Computing Machinery]
卷期号:12 (3): 1-21 被引量:1
标识
DOI:10.1145/3452008
摘要

Efficient and stable exploration remains a key challenge for deep reinforcement learning (DRL) operating in high-dimensional action and state spaces. Recently, a more promising approach by combining the exploration in the action space with the exploration in the parameters space has been proposed to get the best of both methods. In this article, we propose a new iterative and close-loop framework by combining the evolutionary algorithm (EA), which does explorations in a gradient-free manner directly in the parameters space with an actor-critic, and the deep deterministic policy gradient (DDPG) reinforcement learning algorithm, which does explorations in a gradient-based manner in the action space to make these two methods cooperate in a more balanced and efficient way. In our framework, the policies represented by the EA population (the parametric perturbation part) can evolve in a guided manner by utilizing the gradient information provided by the DDPG and the policy gradient part (DDPG) is used only as a fine-tuning tool for the best individual in the EA population to improve the sample efficiency. In particular, we propose a criterion to determine the training steps required for the DDPG to ensure that useful gradient information can be generated from the EA generated samples and the DDPG and EA part can work together in a more balanced way during each generation. Furthermore, within the DDPG part, our algorithm can flexibly switch between fine-tuning the same previous RL-Actor and fine-tuning a new one generated by the EA according to different situations to further improve the efficiency. Experiments on a range of challenging continuous control benchmarks demonstrate that our algorithm outperforms related works and offers a satisfactory trade-off between stability and sample efficiency.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
建议保存本图,每天支付宝扫一扫(相册选取)领红包
实时播报
研友_VZG7GZ应助喜洋羊采纳,获得10
2秒前
可爱的函函应助周爱李采纳,获得10
2秒前
通行证应助TAO采纳,获得10
3秒前
ding应助Popeye采纳,获得30
3秒前
jjwen完成签到 ,获得积分10
3秒前
4秒前
忐忑的盼海完成签到 ,获得积分10
5秒前
6秒前
8秒前
ly666完成签到,获得积分10
8秒前
开放的剑封完成签到,获得积分10
9秒前
10秒前
果果子发布了新的文献求助10
11秒前
思恩Shen发布了新的文献求助20
11秒前
jzhou65发布了新的文献求助10
12秒前
科研通AI6应助WN采纳,获得10
12秒前
TAO完成签到,获得积分10
12秒前
13秒前
你嵙这个期刊没买应助MT采纳,获得10
13秒前
洋123完成签到 ,获得积分10
14秒前
Qiuju完成签到,获得积分10
16秒前
16秒前
风铃发布了新的文献求助10
17秒前
顾矜应助今天学习了吗采纳,获得10
17秒前
妮妮发布了新的文献求助30
18秒前
李健应助Selena采纳,获得10
19秒前
19秒前
槐椟发布了新的文献求助10
19秒前
zyf发布了新的文献求助10
20秒前
21秒前
万能图书馆应助xuwan采纳,获得10
21秒前
xiaotianli完成签到,获得积分10
21秒前
早日发paper完成签到,获得积分10
22秒前
23秒前
大意的紫菱完成签到,获得积分10
23秒前
23秒前
24秒前
刘一三发布了新的文献求助10
25秒前
25秒前
26秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Mentoring for Wellbeing in Schools 1200
List of 1,091 Public Pension Profiles by Region 1061
Binary Alloy Phase Diagrams, 2nd Edition 600
Atlas of Liver Pathology: A Pattern-Based Approach 500
A Technologist’s Guide to Performing Sleep Studies 500
EEG in Childhood Epilepsy: Initial Presentation & Long-Term Follow-Up 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5496786
求助须知:如何正确求助?哪些是违规求助? 4594394
关于积分的说明 14444554
捐赠科研通 4526940
什么是DOI,文献DOI怎么找? 2480572
邀请新用户注册赠送积分活动 1465029
关于科研通互助平台的介绍 1437762