分而治之算法
强化学习
帕累托原理
计算机科学
钢筋
前线(军事)
人工智能
数理经济学
经济
心理学
运营管理
工程类
社会心理学
算法
机械工程
作者
Willem Röpke,Mathieu Reymond,Patrick Mannion,Diederik M. Roijers,Ann Nowé,Roxana Rădulescu
出处
期刊:Cornell University - arXiv
日期:2024-02-11
标识
DOI:10.48550/arxiv.2402.07182
摘要
An important challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies to attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), which decomposes finding the Pareto front into a sequence of constrained single-objective problems. This enables us to guarantee convergence while providing an upper bound on the distance to undiscovered Pareto optimal solutions at each step. We evaluate IPRO using utility-based metrics and its hypervolume and find that it matches or outperforms methods that require additional assumptions. By leveraging problem-specific single-objective solvers, our approach also holds promise for applications beyond multi-objective reinforcement learning, such as planning and pathfinding.
科研通智能强力驱动
Strongly Powered by AbleSci AI