Vai menu di sezione

Tool of evaluation - T4D (Thinking for Doing) dataset
Anno 2023

Dataset derived from ToMi for evaluating whether LLMs can use Theory-of-Mind inferences to choose appropriate actions.

The T4D dataset is a converted dataset based on ToMi and the paper “How Far Are Large Language Models from Agents with Theory-of-Mind?”. It is designed to test whether models can move beyond answering explicit Theory-of-Mind questions and instead use inferred mental states to decide on socially appropriate actions. The dataset contains 564 rows and is distributed through Hugging Face under an Apache 2.0 license. A linked GitHub repository provides the conversion code. This resource is useful for benchmarking situated Theory-of-Mind reasoning and action selection.

This dataset can be used throughout BSC’s secondment to CERTH-Psy on April 2026.

Developer of the tool of evaluation: sachithgunasekara / Hugging Face dataset card based on Zhou et al.

The tool of evaluation is available at the following link.

Christine Kakalou, CERTH
Pubblicato il: Mercoledì, 04 Ottobre 2023 - Ultima modifica: Mercoledì, 06 Maggio 2026
torna all'inizio