强化学习
被盖腹侧区
钢筋
计算机科学
人工智能
实现(概率)
代表(政治)
机器学习
认知心理学
多巴胺
心理学
数学
神经科学
统计
社会心理学
多巴胺能
政治
政治学
法学
作者
Will Dabney,Zeb Kurth‐Nelson,Naoshige Uchida,Clara Kwon Starkweather,Demis Hassabis,Rémi Munos,Matthew Botvinick
出处
期刊:Nature
[Nature Portfolio]
日期:2020-01-15
卷期号:577 (7792): 671-675
被引量:390
标识
DOI:10.1038/s41586-019-1924-6
摘要
Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1–3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4–6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning. Analyses of single-cell recordings from mouse ventral tegmental area are consistent with a model of reinforcement learning in which the brain represents possible future rewards not as a single mean of stochastic outcomes, as in the canonical model, but instead as a probability distribution.
科研通智能强力驱动
Strongly Powered by AbleSci AI