AI Alignment: A Contemporary Survey

作者
Jiaming Ji,Tianyi Qiu,Boyuan Chen,Jiayi Zhou,Borong Zhang,Dawei Hong,Hantao Lou,K. Wang,Yawen Duan,Zhonghao He,Lukas Vierling,Zhaowei Zhang,F. R. Zeng,Juntao Dai,Xuehai Pan,Hua Xu,Aidan O’Gara,Kwan Yee Ng,Brian Tse,Jie Fu
出处
期刊:ACM Computing Surveys [Association for Computing Machinery]
被引量:3
标识
DOI:10.1145/3770749
摘要

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality ( RICE ). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment . The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems’ alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under the distribution shift. Specifically, we survey traditional preference modeling methods and reinforcement learning from human feedback and further discuss potential frameworks to reach scalable oversight for tasks where effective human oversight is hard to obtain. Within learning under distribution shift, we also cover data distribution interventions such as adversarial training that helps expand the distribution of training data and algorithmic interventions to combat goal misgeneralization. On backward alignment, we discuss assurance techniques and governance practices. Specifically, we survey assurance methods of AI systems throughout their lifecycle, covering safety evaluation, interpretability, and human value compliance. We discuss current and prospective governance practices adopted by governments, industry actors, and other third parties, aimed at managing existing and future AI risks. This survey aims to provide a comprehensive yet beginner-friendly review of alignment research topics. Based on this, we also release and continually update the website www.alignmentsurvey.com which features tutorials, collections of papers, blog posts, and other resources.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
zxp完成签到,获得积分10
1秒前
超级蜻蜓完成签到 ,获得积分10
2秒前
lyz发布了新的文献求助10
4秒前
4秒前
ju00完成签到,获得积分10
4秒前
lx840518完成签到,获得积分10
5秒前
新德里梅塔洛1号完成签到,获得积分10
6秒前
隐形曼青应助酷炫的向雪采纳,获得10
7秒前
Cai应助LLLucen采纳,获得10
7秒前
贝北呗完成签到,获得积分10
8秒前
wzhang发布了新的文献求助10
8秒前
lanbing802完成签到,获得积分10
8秒前
早点睡完成签到 ,获得积分10
8秒前
洋溢完成签到,获得积分10
8秒前
共享精神应助郭晋尧采纳,获得10
9秒前
甜甜圈完成签到,获得积分10
9秒前
务实天空完成签到,获得积分10
9秒前
果酱完成签到,获得积分10
11秒前
彩卷卷完成签到,获得积分10
12秒前
bobo完成签到,获得积分0
13秒前
aurevoir完成签到,获得积分10
13秒前
zxy14完成签到,获得积分10
14秒前
IleraYoung发布了新的文献求助10
15秒前
科研通AI6.2应助dongli6536采纳,获得10
16秒前
夜凉如水完成签到,获得积分10
17秒前
虚幻灵松完成签到,获得积分10
19秒前
cy完成签到 ,获得积分10
19秒前
Laisy完成签到,获得积分10
20秒前
现实的日记本完成签到,获得积分10
20秒前
20秒前
时舒完成签到 ,获得积分10
20秒前
Likz完成签到,获得积分10
22秒前
1351019完成签到,获得积分10
23秒前
研友_Lw7OvL完成签到 ,获得积分10
23秒前
11_23完成签到,获得积分10
23秒前
乐观的大叔完成签到 ,获得积分10
23秒前
万象更新完成签到,获得积分10
24秒前
圣诞树完成签到,获得积分10
24秒前
世间安得双全法完成签到,获得积分0
25秒前
Hypnos完成签到 ,获得积分10
25秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
Burger's Medicinal Chemistry and Drug Discovery 400
A Step-by-Step Guide to Qualitative Data Coding 2nd Edition 400
Impact of Storage Orientation and Duration on Prefilled Syringe Performance: Break-Loose and Glide Forces, and Injection Time Across Multiple Time Points 360
Programming for Chemical Engineers Using C, C++, and MATLAB 300
Upland Kenya wild flowers and ferns: a flora of the flowers, ferns, grasses, and sedges of highland Kenya 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6663919
求助须知:如何正确求助?哪些是违规求助? 8413718
关于积分的说明 17985286
捐赠科研通 5868450
什么是DOI,文献DOI怎么找? 2975259
邀请新用户注册赠送积分活动 1951121
关于科研通互助平台的介绍 1877323