Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.12649
Cited By
v1
v2
v3 (latest)
Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation
20 February 2024
Kristian Lum
Jacy Reese Anthis
Chirag Nagpal
Alex DÁmour
Alexander D’Amour
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation"
45 / 45 papers shown
Title
GenderBench: Evaluation Suite for Gender Biases in LLMs
Matúš Pikuliak
79
0
0
17 May 2025
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Arjun Subramonian
Vagrant Gautam
Preethi Seshadri
Dietrich Klakow
Kai-Wei Chang
Ningyu Zhang
93
1
0
23 Apr 2025
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Riccardo Cantini
A. Orsino
Massimo Ruggiero
Domenico Talia
AAML
ELM
94
4
0
10 Apr 2025
LLM Social Simulations Are a Promising Research Method
Jacy Reese Anthis
Ryan Liu
Sean M. Richardson
Austin C. Kozlowski
Bernard Koch
James A. Evans
Erik Brynjolfsson
Michael S. Bernstein
ALM
93
15
0
03 Apr 2025
Toward an Evaluation Science for Generative AI Systems
Laura Weidinger
Deb Raji
Hanna M. Wallach
Margaret Mitchell
Angelina Wang
Olawale Salaudeen
Rishi Bommasani
Sayash Kapoor
Deep Ganguli
Sanmi Koyejo
EGVM
ELM
96
10
0
07 Mar 2025
Do LLMs exhibit demographic parity in responses to queries about Human Rights?
Rafiya Javed
Jackie Kay
David Yanni
Abdullah Zaini
Anushe Sheikh
Maribeth Rauh
Iason Gabriel
Laura Weidinger
95
0
0
26 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez-Llorca
ELM
227
6
0
10 Feb 2025
Towards Effective Discrimination Testing for Generative AI
Thomas P. Zollo
Nikita Rajaneesh
Richard Zemel
Talia B. Gillis
Emily Black
154
1
0
31 Dec 2024
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Manuel Tonneau
Diyi Liu
Niyati Malhotra
Scott A. Hale
Samuel Fraiberger
Victor Orozco-Olvera
Paul Röttger
132
2
0
23 Nov 2024
Fairness Definitions in Language Models Explained
Thang Viet Doan
Zhibo Chu
Zichong Wang
Wenbin Zhang
ALM
81
10
0
26 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
99
7
0
11 Jul 2024
The Impossibility of Fair LLMs
Jacy Reese Anthis
Kristian Lum
Michael Ekstrand
Avi Feller
Alexander D’Amour
FaML
114
14
0
28 May 2024
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
86
230
0
02 May 2024
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps
Giuseppe Attanasio
Beatrice Savoldi
Dennis Fucci
Dirk Hovy
75
9
0
28 Feb 2024
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance
A. Salinas
Fred Morstatter
71
54
0
08 Jan 2024
Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness
Jacy Reese Anthis
Victor Veitch
64
16
0
30 Oct 2023
Sociotechnical Safety Evaluation of Generative AI Systems
Laura Weidinger
Maribeth Rauh
Nahema Marchal
Arianna Manzini
Lisa Anne Hendricks
...
Conor Griffin
Ben Bariach
Iason Gabriel
Verena Rieser
William S. Isaac
EGVM
56
139
0
18 Oct 2023
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters
Yixin Wan
George Pu
Jiao Sun
Aparna Garimella
Kai-Wei Chang
Nanyun Peng
91
192
0
13 Oct 2023
Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT
A. Veldanda
Fabian Grob
Shailja Thakur
Hammond Pearce
Benjamin Tan
Ramesh Karri
Siddharth Garg
67
18
0
08 Oct 2023
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
101
594
0
02 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
396
12,044
0
18 Jul 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
402
1,964
0
07 Apr 2023
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
213
3,150
0
20 Oct 2022
Debiasing isn't enough! -- On the Effectiveness of Debiasing MLMs and their Social Biases in Downstream Tasks
Masahiro Kaneko
Danushka Bollegala
Naoaki Okazaki
62
46
0
06 Oct 2022
Social Simulacra: Creating Populated Prototypes for Social Computing Systems
J. Park
Lindsay Popowski
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
75
293
0
08 Aug 2022
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
211
1,775
0
09 Jun 2022
On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations
Yang Trista Cao
Yada Pruksachatkun
Kai-Wei Chang
Rahul Gupta
Varun Kumar
Jwala Dhamala
Aram Galstyan
45
99
0
25 Mar 2022
Assessing the Fairness of AI Systems: AI Practitioners' Processes, Challenges, and Needs for Support
Michael A. Madaio
Lisa Egede
Hariharan Subramonyam
Jennifer Wortman Vaughan
Hanna M. Wallach
63
147
0
10 Dec 2021
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
270
422
0
15 Oct 2021
Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models
Daniel de Vassimon Manela
D. Errington
Thomas Fisher
B. V. Breugel
Pasquale Minervini
46
94
0
24 Jan 2021
Intrinsic Bias Metrics Do Not Correlate with Application Bias
Seraphina Goldfarb-Tarrant
Rebecca Marchant
Ricardo Muñoz Sánchez
Mugdha Pandya
Adam Lopez
126
179
0
31 Dec 2020
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
Nikita Nangia
Clara Vania
Rasika Bhalerao
Samuel R. Bowman
131
685
0
30 Sep 2020
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases
W. Guo
Aylin Caliskan
39
243
0
06 Jun 2020
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
Su Lin Blodgett
Solon Barocas
Hal Daumé
Hanna M. Wallach
157
1,248
0
28 May 2020
StereoSet: Measuring stereotypical bias in pretrained language models
Moin Nadeem
Anna Bethke
Siva Reddy
101
1,014
0
20 Apr 2020
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Jingqing Zhang
Yao-Min Zhao
Mohammad Saleh
Peter J. Liu
RALM
3DGS
297
2,051
0
18 Dec 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
264
10,851
0
29 Oct 2019
Measuring Bias in Contextualized Word Representations
Keita Kurita
Nidhi Vyas
Ayush Pareek
A. Black
Yulia Tsvetkov
106
451
0
18 Jun 2019
Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting
Maria De-Arteaga
Alexey Romanov
Hanna M. Wallach
J. Chayes
C. Borgs
Alexandra Chouldechova
S. Geyik
K. Kenthapadi
Adam Tauman Kalai
197
460
0
27 Jan 2019
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
Jieyu Zhao
Tianlu Wang
Mark Yatskar
Vicente Ordonez
Kai-Wei Chang
130
942
0
18 Apr 2018
Semantics derived automatically from language corpora contain human-like biases
Aylin Caliskan
J. Bryson
Arvind Narayanan
217
2,673
0
25 Aug 2016
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Tolga Bolukbasi
Kai-Wei Chang
James Zou
Venkatesh Saligrama
Adam Kalai
CVBM
FaML
112
3,150
0
21 Jul 2016
Enriching Word Vectors with Subword Information
Piotr Bojanowski
Edouard Grave
Armand Joulin
Tomas Mikolov
NAI
SSL
VLM
232
9,980
0
15 Jul 2016
Improved Techniques for Training GANs
Tim Salimans
Ian Goodfellow
Wojciech Zaremba
Vicki Cheung
Alec Radford
Xi Chen
GAN
486
9,067
0
10 Jun 2016
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
686
31,544
0
16 Jan 2013
1