计算机科学
反事实思维
变压器
一般化
任务(项目管理)
集合(抽象数据类型)
召回
编码(集合论)
秩(图论)
自回归模型
人工智能
自然语言处理
程序设计语言
认知心理学
心理学
计量经济学
数学分析
经济
物理
电压
组合数学
社会心理学
管理
量子力学
数学
作者
Kevin Meng,David Bau,Alex Andonian,Yonatan Belinkov
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:154
标识
DOI:10.48550/arxiv.2202.05262
摘要
We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/
科研通智能强力驱动
Strongly Powered by AbleSci AI