Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models

29 May 2023

Esin Durmus

Dan Jurafsky

Papers citing "Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models"

35 / 35 papers shown

Title
A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias Brandon Smith Mohamed Reda Bouadjenek Tahsin Alamgir Kheya Phillip Dawson S. Aryal ALM ELM 26 0 0 14 May 2025
Improving Language Model Personas via Rationalization with Psychological Scaffolds Brihi Joshi Xiang Ren Swabha Swayamdipta Rik Koncel-Kedziorski Tim Paek 73 0 0 25 Apr 2025
What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns Michael A. Hedderich Anyi Wang Raoyuan Zhao Florian Eichin Barbara Plank 35 0 0 22 Apr 2025
Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages Alessio Buscemi Cedric Lothritz Sergio Morales Marcos Gomez-Vazquez Robert Clarisó Jordi Cabot German Castignani 31 0 0 19 Apr 2025
Fair Text Classification via Transferable Representations Thibaud Leteno Michael Perrot Charlotte Laclau Antoine Gourru Christophe Gravier FaML 88 0 0 10 Mar 2025
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models Jen-tse Huang Jiantong Qin Jianping Zhang Youliang Yuan Wenxuan Wang Jieyu Zhao VLM 64 0 0 10 Mar 2025
An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts Dhia Elhaq Rzig Dhruba Jyoti Paul Kaiser Pister Jordan Henkel Foyzul Hassan 80 0 0 21 Jan 2025
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection Yachao Zhao Bo Wang Yan Wang 50 2 0 04 Jan 2025
Evaluating the Prompt Steerability of Large Language Models Erik Miehling Michael Desmond K. Ramamurthy Elizabeth M. Daly Pierre L. Dognin Jesus Rios Djallel Bouneffouf Miao Liu LLMSV 89 3 0 19 Nov 2024
Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models Minh Duc Bui K. Wense Anne Lauscher VLM 34 1 0 06 Nov 2024
LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education Iain Xie Weissburg Sathvika Anand Sharon Levy Haewon Jeong 71 3 0 17 Oct 2024
On the Influence of Gender and Race in Romantic Relationship Prediction from Large Language Models Abhilasha Sancheti Haozhe An Rachel Rudinger 39 0 0 05 Oct 2024
Agentic Society: Merging skeleton from real world and texture from Large Language Model Yuqi Bai Kun Sun Huishi Yin 40 1 0 02 Sep 2024
LLMs generate structurally realistic social networks but overestimate political homophily Serina Chang Alicja Chaszczewicz Emma Wang Maya Josifovska Emma Pierson J. Leskovec 49 6 0 29 Aug 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions? Kristina Gligorić Tijana Zrnic Cinoo Lee Emmanuel J. Candès Dan Jurafsky 72 5 0 27 Aug 2024
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models Hila Gonen Terra Blevins Alisa Liu Luke Zettlemoyer Noah A. Smith 31 5 0 12 Aug 2024
Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance Manon Reusens Philipp Borchert Jochen De Weerdt Bart Baesens 44 1 0 25 Jun 2024
Exploring Safety-Utility Trade-Offs in Personalized Language Models Anvesh Rao Vijjini Somnath Basu Roy Chowdhury Snigdha Chaturvedi 53 6 0 17 Jun 2024
Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender? Haozhe An Christabel Acquaye Colin Wang Zongxia Li Rachel Rudinger 36 12 0 15 Jun 2024
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences Daiwei Chen Yi Chen Aniket Rege Ramya Korlakai Vinayak 46 17 0 12 Jun 2024
Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models Jisu Shin Hoyun Song Huije Lee Soyeong Jeong Jong C. Park 38 6 0 06 Jun 2024
The Impact of Unstated Norms in Bias Analysis of Language Models Farnaz Kohankhaki D. B. Emerson David B. Emerson Laleh Seyyed-Kalantari Faiza Khan Khattak 60 1 0 04 Apr 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models Xinpeng Wang Shitong Duan Xiaoyuan Yi Jing Yao Shanlin Zhou Zhihua Wei Peng Zhang Dongkuan Xu Maosong Sun Xing Xie OffRL 41 16 0 07 Mar 2024
Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution Flor Miriam Plaza del Arco Amanda Cercas Curry Alba Curry Gavin Abercrombie Dirk Hovy 37 24 0 05 Mar 2024
Random Silicon Sampling: Simulating Human Sub-Population Opinion Using a Large Language Model Based on Group-Level Demographic Information Seungjong Sun Eungu Lee Dongyan Nan Xiangying Zhao Wonbyung Lee Bernard J. Jansen Jang Hyun Kim 56 17 0 28 Feb 2024
Canvil: Designerly Adaptation for LLM-Powered User Experiences K. J. Kevin Feng Q. V. Liao Ziang Xiao Jennifer Wortman Vaughan Amy X. Zhang David W. McDonald 43 16 0 17 Jan 2024
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences Yuanhe Tian Ruyi Gan Yan Song Jiaxing Zhang Yongdong Zhang AI4MH AI4CE LM&MA 27 31 0 10 Nov 2023
Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting Tilman Beck Hendrik Schuff Anne Lauscher Iryna Gurevych 43 33 0 13 Sep 2023
FairMonitor: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models Yanhong Bai Jiabao Zhao Jinxin Shi Tingjiang Wei Xingjiao Wu Liangbo He 36 0 0 21 Aug 2023
A Survey on Fairness in Large Language Models Yingji Li Mengnan Du Rui Song Xin Wang Ying Wang ALM 52 60 0 20 Aug 2023
ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages Sourojit Ghosh Aylin Caliskan 38 69 0 17 May 2023
"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset Eric Michael Smith Melissa Hall Melanie Kambadur Eleonora Presani Adina Williams 79 130 0 18 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 363 12,003 0 04 Mar 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering Alicia Parrish Angelica Chen Nikita Nangia Vishakh Padmakumar Jason Phang Jana Thompson Phu Mon Htut Sam Bowman 223 374 0 15 Oct 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Timo Schick Sahana Udupa Hinrich Schütze 262 374 0 28 Feb 2021