路径(计算)
贴现
价值(数学)
数学
选择(遗传算法)
趋同(经济学)
人工智能
不变(物理)
收敛速度
群(周期表)
计算机科学
变量(数学)
因子(编程语言)
统计
数学优化
机器学习
经济
计算机网络
数学分析
频道(广播)
化学
有机化学
财务
数学物理
程序设计语言
经济增长
出处
期刊:Journal of physics
[IOP Publishing]
日期:2022-12-01
卷期号:2386 (1): 012037-012037
被引量:2
标识
DOI:10.1088/1742-6596/2386/1/012037
摘要
Abstract The model-free characteristic of the Q-learning algorithm, without obtaining information about the environment and being available for agents to learn by themselves, enables Q-learning to be widely applied to path planning fields. Nonetheless, the selection of parameter values will have a crucial impact on the results. In this paper, how to determine an appropriate value of learning rate and discount factor and these parameters’ effect on the overall results will be presented. The agents with different learning rate or discount factor values will perform in randomly generated mazes, the results of which will be aggregated and compared. When the learning rate equals 0.9, under the condition of setting the learning rate as variable and discount factor as invariant, the aggregated data of 0.9 can reach convergence way more quickly than in other settings (0.6, 0.3, 0.1); when the discount factor equals 0.9 and the experiment follows the unique variable principle, the aggregated data of 0.9 searches for shorter path length and faster than other groups (0.6, 0.3, 0.1); when both the learning rate and discount factor are set to 0.9 – other groups are 1.0, 0.1, and 0 – the group of 0.9 is more stable than the group of 0.1 and shows convergence, which does not appear in the group of 1.0 and 0, within 80 iterations.
科研通智能强力驱动
Strongly Powered by AbleSci AI