计算机科学
编码
成对比较
加权
机器学习
人工智能
偏好学习
集合(抽象数据类型)
特征(语言学)
任务(项目管理)
偏爱
功能(生物学)
预测能力
人工神经网络
数学
工程类
哲学
放射科
认识论
统计
基因
生物
化学
程序设计语言
系统工程
进化生物学
医学
生物化学
语言学
作者
Sydney M. Katz,Amir Maleki,Erdem Bıyık,Mykel J. Kochenderfer
出处
期刊:Cornell University - arXiv
日期:2021-01-01
被引量:3
标识
DOI:10.48550/arxiv.2103.02727
摘要
Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward functions that are linear in a set of trajectory features. The features are typically hand-coded, and preference-based learning is used to determine a particular user's relative weighting for each feature. Designing a representative set of features to encode reward is challenging and can result in inaccurate models that fail to model the users' preferences or perform the task properly. In this paper, we present a method to learn both the relative weighting among features as well as additional features that help encode a user's reward function. The additional features are modeled as a neural network that is trained on the data from pairwise comparison queries. We apply our methods to a driving scenario used in previous work and compare the predictive power of our method to that of only hand-coded features. We perform additional analysis to interpret the learned features and examine the optimal trajectories. Our results show that adding an additional learned feature to the reward model enhances both its predictive power and expressiveness, producing unique results for each user.
科研通智能强力驱动
Strongly Powered by AbleSci AI