Study showing that many generative language models reproduce social identity biases similar to humans.
Scientific paper - Generative language models exhibit social identity biases
Anno 2024
This paper investigates whether large language models exhibit social identity biases, including ingroup favoritism and outgroup hostility. Across a large set of models, the authors find that many foundational and some instruction-tuned models produce biased associations resembling those documented in human social psychology. The article also discusses mitigation through training-data curation and instruction fine-tuning. The findings are relevant for fairness evaluation, human–AI interaction, and the potential reinforcement of social biases through conversational systems.
Author of the paper: Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek
Publisher or journal of publication: Nature Computational Science
The paper is available at the following link.