ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.18368
52
3
v1v2 (latest)

Muslim-Violence Bias Persists in Debiased GPT Models

25 October 2023
Babak Hemmatian
Razan Baltaji
Lav Varshney
ArXiv (abs)PDFHTML
Abstract

Abid et al. (2021) showed a tendency in GPT-3 to generate violent completions when prompted about Muslims, compared with other religions. Two pre-registered replication attempts found few violent completions and only the weakest anti-Muslim bias in the Instruct version, fine-tuned to eliminate biased and toxic outputs. However, more pre-registered experiments showed that using common names associated with the religions in prompts increases several-fold the rate of violent completions, revealing a highly significant second-order bias against Muslims. Our content analysis revealed religion-specific violent themes containing highly offensive ideas regardless of prompt format. Replications with ChatGPT suggest that any effects of GPT-3's de-biasing have disappeared with continued model development, as this newer model showed both a strong Muslim-violence bias and rates of violent completions closer to Abid et al. (2021). Our results show the need for continual de-biasing of models in ways that address higher-order associations.

View on arXiv
Comments on this paper