AI Alignment: A Contemporary Survey

作者
Jiaming Ji,Tianyi Qiu,Boyuan Chen,Jiayi Zhou,Borong Zhang,Dawei Hong,Hantao Lou,K. Wang,Yawen Duan,Zhonghao He,Lukas Vierling,Zhaowei Zhang,F. R. Zeng,Juntao Dai,Xuehai Pan,Hua Xu,Aidan O’Gara,Kwan Yee Ng,Brian Tse,Jie Fu
出处
期刊:ACM Computing Surveys [Association for Computing Machinery]
被引量:3
标识
DOI:10.1145/3770749
摘要

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality ( RICE ). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment . The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems’ alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under the distribution shift. Specifically, we survey traditional preference modeling methods and reinforcement learning from human feedback and further discuss potential frameworks to reach scalable oversight for tasks where effective human oversight is hard to obtain. Within learning under distribution shift, we also cover data distribution interventions such as adversarial training that helps expand the distribution of training data and algorithmic interventions to combat goal misgeneralization. On backward alignment, we discuss assurance techniques and governance practices. Specifically, we survey assurance methods of AI systems throughout their lifecycle, covering safety evaluation, interpretability, and human value compliance. We discuss current and prospective governance practices adopted by governments, industry actors, and other third parties, aimed at managing existing and future AI risks. This survey aims to provide a comprehensive yet beginner-friendly review of alignment research topics. Based on this, we also release and continually update the website www.alignmentsurvey.com which features tutorials, collections of papers, blog posts, and other resources.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
afterglow完成签到 ,获得积分10
3秒前
楠瓜发布了新的文献求助10
3秒前
超级天磊完成签到,获得积分10
3秒前
6秒前
量子星尘发布了新的文献求助30
9秒前
竹竹竹发布了新的文献求助10
10秒前
开朗火龙果完成签到 ,获得积分10
11秒前
18秒前
23秒前
量子星尘发布了新的文献求助10
24秒前
1s完成签到,获得积分10
33秒前
风清扬发布了新的文献求助10
34秒前
波波波波波6764完成签到 ,获得积分10
37秒前
科研顺利完成签到,获得积分10
38秒前
Vaeling发布了新的文献求助10
39秒前
知行合一完成签到 ,获得积分10
45秒前
Hollen完成签到 ,获得积分10
51秒前
54秒前
量子星尘发布了新的文献求助30
57秒前
吾独爱莲发布了新的文献求助10
59秒前
Eber完成签到,获得积分10
59秒前
1分钟前
激动的xx完成签到 ,获得积分10
1分钟前
莫三颜完成签到 ,获得积分10
1分钟前
onevip完成签到,获得积分0
1分钟前
宁秋水完成签到 ,获得积分10
1分钟前
qhcaywy发布了新的文献求助20
1分钟前
量子星尘发布了新的文献求助10
1分钟前
1分钟前
luobote完成签到 ,获得积分10
1分钟前
xybjt完成签到 ,获得积分10
1分钟前
leaolf应助科研通管家采纳,获得10
1分钟前
科研通AI2S应助科研通管家采纳,获得10
1分钟前
科研通AI6应助科研通管家采纳,获得150
1分钟前
leaolf应助科研通管家采纳,获得10
1分钟前
科研通AI6应助科研通管家采纳,获得10
1分钟前
leaolf应助科研通管家采纳,获得10
1分钟前
Maestro_S应助科研通管家采纳,获得20
1分钟前
1分钟前
Vaeling完成签到,获得积分10
1分钟前
高分求助中
Comprehensive Toxicology Fourth Edition 24000
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
LRZ Gitlab附件(3D Matching of TerraSAR-X Derived Ground Control Points to Mobile Mapping Data 附件) 2000
World Nuclear Fuel Report: Global Scenarios for Demand and Supply Availability 2025-2040 800
Handbook of Social and Emotional Learning 800
The Social Work Ethics Casebook(2nd,Frederic G. R) 600
Lloyd's Register of Shipping's Approach to the Control of Incidents of Brittle Fracture in Ship Structures 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 5128448
求助须知:如何正确求助?哪些是违规求助? 4331145
关于积分的说明 13494211
捐赠科研通 4167100
什么是DOI,文献DOI怎么找? 2284350
邀请新用户注册赠送积分活动 1285346
关于科研通互助平台的介绍 1225918