Is It Bad to Work All the Time? Cross-Cultural Evaluation of Social Norm Biases in GPT-4

LLMs have been demonstrated to align with the values of Western or North American cultures. Prior work predominantly showed this effect through leveraging surveys that directly ask (originally people and now also LLMs) about their values. However, it is hard to believe that LLMs would consistently apply those values in real-world scenarios. To address that, we take a bottom-up approach, asking LLMs to reason about cultural norms in narratives from different cultures. We find that GPT-4 tends to generate norms that, while not necessarily incorrect, are significantly less culture-specific. In addition, while it avoids overtly generating stereotypes, the stereotypical representations of certain cultures are merely hidden rather than suppressed in the model, and such stereotypes can be easily recovered. Addressing these challenges is a crucial step towards developing LLMs that fairly serve their diverse user base.
View on arXiv@article{liu2025_2505.18322, title={ Is It Bad to Work All the Time? Cross-Cultural Evaluation of Social Norm Biases in GPT-4 }, author={ Zhuozhuo Joy Liu and Farhan Samir and Mehar Bhatia and Laura K. Nelson and Vered Shwartz }, journal={arXiv preprint arXiv:2505.18322}, year={ 2025 } }