清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

Uncovering inequalities in new knowledge learning by large language models across different languages

不平等 钥匙(锁) 计算机科学 过程(计算) 生产力 人工智能 面子(社会学概念) 工作(物理) 社会学 知识管理 语言学
作者
Chenglong Wang,Haoyu Tang,Xiyuan Yang,Yueqi Xie,Yueqi Xie,Jina Suh,Sunayana Sitaram,Junming Huang,Yu Xie,Yu Xie,Pengjun Zhao,Zhaoya Gong,Xing Xie,Fangzhao Wu
出处
期刊:Proceedings of the National Academy of Sciences of the United States of America [National Academy of Sciences]
卷期号:122 (51): e2514626122-e2514626122 被引量:4
标识
DOI:10.1073/pnas.2514626122
摘要

As large language models (LLMs) gradually demonstrate their potential to boost productivity and become integral tools for problem-solving in daily life worldwide, understanding the linguistic inequalities they introduce is becoming increasingly important. Prior research has primarily focused on static analyses of disparities in existing knowledge and capabilities of LLMs across languages. However, LLMs are continuously evolving, acquiring new knowledge to provide current, relevant responses and deliver precise, expert-level answers in specific domains. Investigating linguistic inequalities within this dynamic learning process is, therefore, also essential. In this paper, we explore inequalities in new knowledge learning by LLMs across different languages and four key dimensions: effectiveness, transferability, prioritization, and robustness. Through extensive experiments in both in-context learning and fine-tuning settings, with proprietary and open-source models, we reveal four key findings: 1) LLMs face greater challenges in efficiently and accurately learning new knowledge in lower-resource languages; 2) knowledge learned by LLMs tends to be more easily transferred to higher-resource languages than to lower-resource ones; 3) new knowledge in higher-resource languages is more likely to be retained and prioritized; and 4) LLMs are more robust against incorrect or misleading information in higher-resource languages. We further analyze the underlying causes of these inequalities from linguistic perspectives, pretraining characteristics, and tokenizer design, and propose a preliminary mitigation strategy through the lens of linguistic neurons. This work highlights the urgent need to recognize and address emerging linguistic inequalities in the development of LLMs.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
swalker完成签到,获得积分10
3秒前
lx840518完成签到 ,获得积分10
6秒前
予秋发布了新的文献求助10
9秒前
研友_nxw2xL完成签到,获得积分10
16秒前
星辰大海应助科研通管家采纳,获得10
22秒前
如歌完成签到,获得积分10
23秒前
予秋发布了新的文献求助10
30秒前
予秋发布了新的文献求助10
47秒前
予秋发布了新的文献求助10
1分钟前
拉长的芷烟完成签到 ,获得积分10
1分钟前
予秋完成签到,获得积分10
1分钟前
1分钟前
Mr_龙在天涯完成签到,获得积分10
1分钟前
lucky完成签到 ,获得积分10
1分钟前
蝎子莱莱xth完成签到,获得积分10
2分钟前
氢锂钠钾铷铯钫完成签到,获得积分10
2分钟前
Square完成签到,获得积分10
2分钟前
自觉亦绿发布了新的文献求助10
2分钟前
2分钟前
自觉亦绿发布了新的文献求助30
2分钟前
LFZ完成签到 ,获得积分10
2分钟前
GingerF应助激昂的钥匙采纳,获得80
3分钟前
儒雅的如松完成签到 ,获得积分10
3分钟前
Layover完成签到 ,获得积分10
3分钟前
GRATE完成签到 ,获得积分10
3分钟前
苹果牌牛仔裤完成签到,获得积分10
3分钟前
自然亦凝完成签到,获得积分10
4分钟前
搜集达人应助科研通管家采纳,获得10
4分钟前
liliAnh完成签到 ,获得积分10
4分钟前
5分钟前
5分钟前
5分钟前
Jun发布了新的文献求助30
5分钟前
6分钟前
6分钟前
机智的苗条完成签到,获得积分10
6分钟前
6分钟前
orixero应助Jun采纳,获得10
6分钟前
蓝意完成签到,获得积分0
6分钟前
Kiki完成签到 ,获得积分10
6分钟前
高分求助中
液晶指向矢仿真分析数据集 8888
Invited Discussant 63O and 64O 1000
Ideology and Meaning-Making under the Putin Regime 750
Planetary Tectonism Across the Solar System 500
Petrology and Plate Tectonics 500
Writing Systems 500
A Handbook of User Experience Research & Design in Libraries 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6873725
求助须知:如何正确求助?哪些是违规求助? 8575029
关于积分的说明 18224701
捐赠科研通 6250622
什么是DOI,文献DOI怎么找? 3052444
关于科研通互助平台的介绍 2058828
邀请新用户注册赠送积分活动 2030113