背包问题
计算机科学
旅行商问题
组合优化
强化学习
数学优化
人口
调度(生产过程)
推论
人工智能
集合(抽象数据类型)
机器学习
理论计算机科学
数学
算法
程序设计语言
人口学
社会学
作者
Nathan Grinsztajn,Daniel Furelos-Blanco,Thomas D. Barrett
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:4
标识
DOI:10.48550/arxiv.2210.03475
摘要
Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement additional search strategies, from stochastic sampling and beam search to explicit fine-tuning. In this paper, we argue for the benefits of learning a population of complementary policies, which can be simultaneously rolled out at inference. To this end, we introduce Poppy, a simple training procedure for populations. Instead of relying on a predefined or hand-crafted notion of diversity, Poppy induces an unsupervised specialization targeted solely at maximizing the performance of the population. We show that Poppy produces a set of complementary policies, and obtains state-of-the-art RL results on four popular NP-hard problems: traveling salesman, capacitated vehicle routing, 0-1 knapsack, and job-shop scheduling.
科研通智能强力驱动
Strongly Powered by AbleSci AI