Multilingual Computational Models Reveal Shared Brain Responses to 21 Languages
计算机科学
自然语言处理
语言学
哲学
作者
Andrea Gregor de Varda,Saima Malik-Moraleda,Greta Tuckute,Evelina Fedorenko
标识
DOI:10.1101/2025.02.01.636044
摘要
Abstract At the heart of language neuroscience lies a fundamental question: How does the human brain process the rich variety of languages? Recent developments in Natural Language Processing, particularly in multilingual neural network language models, offer a promising avenue to answer this question by providing a theory-agnostic way of representing linguistic content across languages. Our study leverages these advances to ask how the brains of native speakers of 21 languages respond to linguistic stimuli, and to what extent linguistic representations are similar across languages. We combined existing (12 languages across 4 language families; n=24 participants) and newly collected fMRI data (9 languages across 4 language families; n=27 participants) to evaluate a series of encoding models predicting brain activity in the language network based on representations from diverse multilingual language models (20 models across 8 model classes). We found evidence of cross-lingual robustness in the alignment between language representations in artificial and biological neural networks. Critically, we showed that the encoding models can be transferred zero-shot across languages, so that a model trained to predict brain activity in a set of languages can account for brain responses in a held-out language, even across language families. These results imply a shared component in the processing of different languages, plausibly related to a shared meaning space.