强化学习
人工智能
计算机科学
机器学习
数学优化
数学
作者
Shangding Gu,Bilgehan Sel,Yuhao Ding,Lu Wang,Qingwei Lin,Alois Knoll,Ming Jin
标识
DOI:10.1109/tpami.2025.3528944
摘要
In numerous reinforcement learning (RL) problems involving safety-critical systems, a key challenge lies in balancing multiple objectives while simultaneously meeting all stringent safety constraints. To tackle this issue, we propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence. Our method employs a novel natural policy gradient manipulation method to optimize multiple RL objectives and overcome conflicting gradients between different objectives, since the simple weighted average gradient direction may not be beneficial for specific objectives due to misaligned gradients of different objectives. When there is a violation of a hard constraint, our algorithm steps in to rectify the policy to minimize this violation. Particularly, We establish theoretical convergence and constraint violation guarantees, and our proposed method also outperforms prior state-of-the-art methods on challenging safe multi-objective RL tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI