Dataset derived from ToMi for evaluating whether LLMs can use Theory-of-Mind inferences to choose appropriate actions.
Tool of evaluation - T4D (Thinking for Doing) dataset
Anno 2023
The T4D dataset is a converted dataset based on ToMi and the paper “How Far Are Large Language Models from Agents with Theory-of-Mind?”. It is designed to test whether models can move beyond answering explicit Theory-of-Mind questions and instead use inferred mental states to decide on socially appropriate actions. The dataset contains 564 rows and is distributed through Hugging Face under an Apache 2.0 license. A linked GitHub repository provides the conversion code. This resource is useful for benchmarking situated Theory-of-Mind reasoning and action selection.
This dataset can be used throughout BSC’s secondment to CERTH-Psy on April 2026.
Developer of the tool of evaluation: sachithgunasekara / Hugging Face dataset card based on Zhou et al.
The tool of evaluation is available at the following link.