强化学习
计算机科学
马尔可夫决策过程
数学优化
状态空间
人工智能
最优控制
机器学习
趋同(经济学)
马尔可夫过程
数学
统计
经济
经济增长
作者
Xiangkun He,Zhongxu Hu,Haohan Yang,Chen Lv
出处
期刊:Neurocomputing
[Elsevier BV]
日期:2023-11-02
卷期号:565: 126986-126986
被引量:3
标识
DOI:10.1016/j.neucom.2023.126986
摘要
Reinforcement learning is capable of providing state-of-art performance in end-to-end robotic control tasks. Nevertheless, many real-world control tasks necessitate the balancing of multiple conflicting objectives while simultaneously ensuring that the learned policies adhere to constraints. Additionally, individual users may typically prefer to explore the personalized and diversified robotic control modes via specific preferences. Therefore, this paper presents a novel constrained multi-objective reinforcement learning algorithm for personalized end-to-end robotic control with continuous actions, allowing a trained single model to approximate the Pareto optimal policies for any user-specified preferences. The proposed approach is formulated as a constrained multi-objective Markov decision process, incorporating a nonlinear constraint design to facilitate the agent in learning optimal policies that align with specified user preferences across the entire preference space. Meanwhile, a comprehensive index based on hypervolume and entropy is presented to measure the convergence, diversity and evenness of the learned control policies. The proposed scheme is evaluated on nine multi-objective end-to-end robotic control tasks with continuous action space, and its effectiveness is demonstrated in comparison with the competitive baselines, including classical and state-of-the-art algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI