Paper testing large language models on false-belief tasks commonly used to assess Theory of Mind in humans.
Scientific paper - Evaluating large language models in theory of mind tasks
Anno 2024
This paper evaluates large language models using a battery of false-belief tasks that are widely used to assess Theory of Mind in humans. The results show that more recent models solve a substantial proportion of tasks that smaller or older models fail, suggesting emergent social-reasoning capabilities in some systems. The study is significant because it translates a classic psychological construct into an AI evaluation setting and raises questions about interpretation, anthropomorphism, and the limits of benchmark-based claims about machine social cognition.
Author of the paper: Michal Kosinski
Publisher or journal of publication: Proceedings of the National Academy of Sciences (PNAS)
The paper is available at the following link.
Christine Kakalou, CERTH
Pubblicato il: Lunedì, 29 Aprile 2024 - Ultima modifica: Mercoledì, 06 Maggio 2026