AI Alignment: A Contemporary Survey

作者
Jiaming Ji,Tianyi Qiu,Boyuan Chen,Jiayi Zhou,Borong Zhang,Dawei Hong,Hantao Lou,K. Wang,Yawen Duan,Zhonghao He,Lukas Vierling,Zhaowei Zhang,F. R. Zeng,Juntao Dai,Xuehai Pan,Hua Xu,Aidan O’Gara,Kwan Yee Ng,Brian Tse,Jie Fu
出处
期刊:ACM Computing Surveys [Association for Computing Machinery]
被引量:3
标识
DOI:10.1145/3770749
摘要

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality ( RICE ). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment . The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems’ alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under the distribution shift. Specifically, we survey traditional preference modeling methods and reinforcement learning from human feedback and further discuss potential frameworks to reach scalable oversight for tasks where effective human oversight is hard to obtain. Within learning under distribution shift, we also cover data distribution interventions such as adversarial training that helps expand the distribution of training data and algorithmic interventions to combat goal misgeneralization. On backward alignment, we discuss assurance techniques and governance practices. Specifically, we survey assurance methods of AI systems throughout their lifecycle, covering safety evaluation, interpretability, and human value compliance. We discuss current and prospective governance practices adopted by governments, industry actors, and other third parties, aimed at managing existing and future AI risks. This survey aims to provide a comprehensive yet beginner-friendly review of alignment research topics. Based on this, we also release and continually update the website www.alignmentsurvey.com which features tutorials, collections of papers, blog posts, and other resources.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
无花果应助张安安采纳,获得10
刚刚
充电宝应助欢喜的祥采纳,获得10
1秒前
2秒前
2re发布了新的文献求助10
2秒前
2秒前
小蘑菇应助半盏采纳,获得10
3秒前
清风_breeze完成签到,获得积分10
3秒前
烟花应助rsy采纳,获得10
4秒前
努力的小李完成签到,获得积分20
4秒前
5秒前
英姑应助冷艳的小笼包采纳,获得10
5秒前
CodeCraft应助犹豫书瑶采纳,获得10
5秒前
6秒前
6秒前
wwt发布了新的文献求助10
8秒前
8秒前
22发布了新的文献求助30
9秒前
胖头鱼发布了新的文献求助30
10秒前
10秒前
12秒前
12秒前
英姑应助vv采纳,获得10
13秒前
PP发布了新的文献求助30
13秒前
13秒前
13秒前
14秒前
2re完成签到,获得积分10
15秒前
淡淡天宇发布了新的文献求助10
15秒前
16秒前
jay发布了新的文献求助50
16秒前
辛勤牛青完成签到,获得积分10
17秒前
小马甲应助LWJ采纳,获得10
17秒前
18秒前
王佳倩完成签到,获得积分20
18秒前
上官若男应助iW采纳,获得10
19秒前
WangYZ发布了新的文献求助10
19秒前
F1nka应助文艺千琴采纳,获得10
20秒前
大个应助半盏采纳,获得10
20秒前
二号发布了新的文献求助10
21秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
The Resilient Mindset 400
Impact of Storage Orientation and Duration on Prefilled Syringe Performance: Break-Loose and Glide Forces, and Injection Time Across Multiple Time Points 360
Programming for Chemical Engineers Using C, C++, and MATLAB 300
Upland Kenya wild flowers and ferns: a flora of the flowers, ferns, grasses, and sedges of highland Kenya 300
Disturbing the Quiet Life? Competition and CEO Incentives 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6654382
求助须知:如何正确求助?哪些是违规求助? 8407618
关于积分的说明 17977135
捐赠科研通 5851042
什么是DOI,文献DOI怎么找? 2972283
邀请新用户注册赠送积分活动 1948057
关于科研通互助平台的介绍 1869116