强化学习
计算机科学
趋同(经济学)
帕累托最优
帕累托原理
平衡(能力)
数学优化
网格
方向(向量空间)
人工智能
多目标优化
机器学习
数学
经济
物理医学与康复
医学
经济增长
几何学
作者
Jinsheng Ren,Shangqi Guo,Feng Chen
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2022-11-01
卷期号:33 (11): 6458-6472
被引量:2
标识
DOI:10.1109/tnnls.2021.3080521
摘要
Auxiliary rewards are widely used in complex reinforcement learning tasks. However, previous work can hardly avoid the interference of auxiliary rewards on pursuing the main rewards, which leads to the destruction of the optimal policy. Thus, it is challenging but essential to balance the main and auxiliary rewards. In this article, we explicitly formulate the problem of rewards’ balancing as searching for a Pareto optimal solution, with the overall objective of preserving the policy’s optimization orientation for the main rewards (i.e., the policy driven by the balanced rewards is consistent with the policy driven by the main rewards). To this end, we propose a variant Pareto and show that it can effectively guide the policy search toward more main rewards. Furthermore, we establish an iterative learning framework for rewards’ balancing and theoretically analyze its convergence and time complexity. Experiments in both discrete (grid word) and continuous (Doom) environments demonstrated that our algorithm can effectively balance rewards, and achieve remarkable performance compared with those RLs with heuristically designed rewards. In the ViZDoom platform, our algorithm can learn expert-level policies.
科研通智能强力驱动
Strongly Powered by AbleSci AI