考试(生物学)
情商
心理学
清晰
词汇
一致性(知识库)
认知心理学
情商评估
人类智力
社会心理学
多样性(政治)
发展心理学
人工智能
计算机科学
语言学
古生物学
生物化学
化学
哲学
社会学
人类学
生物
作者
Katja Schlegel,Nils R. Sommer,Marcello Mortillaro
标识
DOI:10.1038/s44271-025-00258-x
摘要
Abstract Large Language Models (LLMs) demonstrate expertise across diverse domains, yet their capacity for emotional intelligence remains uncertain. This research examined whether LLMs can solve and generate performance-based emotional intelligence tests. Results showed that ChatGPT-4, ChatGPT-o1, Gemini 1.5 flash, Copilot 365, Claude 3.5 Haiku, and DeepSeek V3 outperformed humans on five standard emotional intelligence tests, achieving an average accuracy of 81%, compared to the 56% human average reported in the original validation studies. In a second step, ChatGPT-4 generated new test items for each emotional intelligence test. These new versions and the original tests were administered to human participants across five studies (total N = 467). Overall, original and ChatGPT-generated tests demonstrated statistically equivalent test difficulty. Perceived item clarity and realism, item content diversity, internal consistency, correlations with a vocabulary test, and correlations with an external ability emotional intelligence test were not statistically equivalent between original and ChatGPT-generated tests. However, all differences were smaller than Cohen’s d ± 0.25, and none of the 95% confidence interval boundaries exceeded a medium effect size (d ± 0.50). Additionally, original and ChatGPT-generated tests were strongly correlated (r = 0.46). These findings suggest that LLMs can generate responses that are consistent with accurate knowledge about human emotions and their regulation.
科研通智能强力驱动
Strongly Powered by AbleSci AI