ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.12649
  4. Cited By
Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation
v1v2v3 (latest)

Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation

20 February 2024
Kristian Lum
Jacy Reese Anthis
Chirag Nagpal
Alex DÁmour
Alexander D’Amour
ArXiv (abs)PDFHTML

Papers citing "Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation"

45 / 45 papers shown
Title
GenderBench: Evaluation Suite for Gender Biases in LLMs
GenderBench: Evaluation Suite for Gender Biases in LLMs
Matúš Pikuliak
79
0
0
17 May 2025
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Arjun Subramonian
Vagrant Gautam
Preethi Seshadri
Dietrich Klakow
Kai-Wei Chang
Ningyu Zhang
93
1
0
23 Apr 2025
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Riccardo Cantini
A. Orsino
Massimo Ruggiero
Domenico Talia
AAMLELM
94
4
0
10 Apr 2025
LLM Social Simulations Are a Promising Research Method
LLM Social Simulations Are a Promising Research Method
Jacy Reese Anthis
Ryan Liu
Sean M. Richardson
Austin C. Kozlowski
Bernard Koch
James A. Evans
Erik Brynjolfsson
Michael S. Bernstein
ALM
93
15
0
03 Apr 2025
Toward an Evaluation Science for Generative AI Systems
Laura Weidinger
Deb Raji
Hanna M. Wallach
Margaret Mitchell
Angelina Wang
Olawale Salaudeen
Rishi Bommasani
Sayash Kapoor
Deep Ganguli
Sanmi Koyejo
EGVMELM
96
10
0
07 Mar 2025
Do LLMs exhibit demographic parity in responses to queries about Human Rights?
Do LLMs exhibit demographic parity in responses to queries about Human Rights?
Rafiya Javed
Jackie Kay
David Yanni
Abdullah Zaini
Anushe Sheikh
Maribeth Rauh
Iason Gabriel
Laura Weidinger
95
0
0
26 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez-Llorca
ELM
227
6
0
10 Feb 2025
Towards Effective Discrimination Testing for Generative AI
Towards Effective Discrimination Testing for Generative AI
Thomas P. Zollo
Nikita Rajaneesh
Richard Zemel
Talia B. Gillis
Emily Black
154
1
0
31 Dec 2024
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Manuel Tonneau
Diyi Liu
Niyati Malhotra
Scott A. Hale
Samuel Fraiberger
Victor Orozco-Olvera
Paul Röttger
132
2
0
23 Nov 2024
Fairness Definitions in Language Models Explained
Fairness Definitions in Language Models Explained
Thang Viet Doan
Zhibo Chu
Zichong Wang
Wenbin Zhang
ALM
81
10
0
26 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
99
7
0
11 Jul 2024
The Impossibility of Fair LLMs
The Impossibility of Fair LLMs
Jacy Reese Anthis
Kristian Lum
Michael Ekstrand
Avi Feller
Alexander D’Amour
FaML
114
14
0
28 May 2024
WildChat: 1M ChatGPT Interaction Logs in the Wild
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
84
230
0
02 May 2024
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models
  Exhibit Gender Performance Gaps
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps
Giuseppe Attanasio
Beatrice Savoldi
Dennis Fucci
Dirk Hovy
75
8
0
28 Feb 2024
The Butterfly Effect of Altering Prompts: How Small Changes and
  Jailbreaks Affect Large Language Model Performance
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance
A. Salinas
Fred Morstatter
71
54
0
08 Jan 2024
Causal Context Connects Counterfactual Fairness to Robust Prediction and
  Group Fairness
Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness
Jacy Reese Anthis
Victor Veitch
64
16
0
30 Oct 2023
Sociotechnical Safety Evaluation of Generative AI Systems
Sociotechnical Safety Evaluation of Generative AI Systems
Laura Weidinger
Maribeth Rauh
Nahema Marchal
Arianna Manzini
Lisa Anne Hendricks
...
Conor Griffin
Ben Bariach
Iason Gabriel
Verena Rieser
William S. Isaac
EGVM
54
139
0
18 Oct 2023
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in
  LLM-Generated Reference Letters
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters
Yixin Wan
George Pu
Jiao Sun
Aparna Garimella
Kai-Wei Chang
Nanyun Peng
89
192
0
13 Oct 2023
Are Emily and Greg Still More Employable than Lakisha and Jamal?
  Investigating Algorithmic Hiring Bias in the Era of ChatGPT
Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT
A. Veldanda
Fabian Grob
Shailja Thakur
Hammond Pearce
Benjamin Tan
Ramesh Karri
Siddharth Garg
65
18
0
08 Oct 2023
Bias and Fairness in Large Language Models: A Survey
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
99
593
0
02 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
396
12,044
0
18 Jul 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&RoAI4CE
402
1,964
0
07 Apr 2023
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLMLRM
208
3,150
0
20 Oct 2022
Debiasing isn't enough! -- On the Effectiveness of Debiasing MLMs and
  their Social Biases in Downstream Tasks
Debiasing isn't enough! -- On the Effectiveness of Debiasing MLMs and their Social Biases in Downstream Tasks
Masahiro Kaneko
Danushka Bollegala
Naoaki Okazaki
62
46
0
06 Oct 2022
Social Simulacra: Creating Populated Prototypes for Social Computing
  Systems
Social Simulacra: Creating Populated Prototypes for Social Computing Systems
J. Park
Lindsay Popowski
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
75
293
0
08 Aug 2022
Beyond the Imitation Game: Quantifying and extrapolating the
  capabilities of language models
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
208
1,775
0
09 Jun 2022
On the Intrinsic and Extrinsic Fairness Evaluation Metrics for
  Contextualized Language Representations
On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations
Yang Trista Cao
Yada Pruksachatkun
Kai-Wei Chang
Rahul Gupta
Varun Kumar
Jwala Dhamala
Aram Galstyan
45
99
0
25 Mar 2022
Assessing the Fairness of AI Systems: AI Practitioners' Processes,
  Challenges, and Needs for Support
Assessing the Fairness of AI Systems: AI Practitioners' Processes, Challenges, and Needs for Support
Michael A. Madaio
Lisa Egede
Hariharan Subramonyam
Jennifer Wortman Vaughan
Hanna M. Wallach
63
147
0
10 Dec 2021
BBQ: A Hand-Built Bias Benchmark for Question Answering
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
270
422
0
15 Oct 2021
Stereotype and Skew: Quantifying Gender Bias in Pre-trained and
  Fine-tuned Language Models
Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models
Daniel de Vassimon Manela
D. Errington
Thomas Fisher
B. V. Breugel
Pasquale Minervini
46
94
0
24 Jan 2021
Intrinsic Bias Metrics Do Not Correlate with Application Bias
Intrinsic Bias Metrics Do Not Correlate with Application Bias
Seraphina Goldfarb-Tarrant
Rebecca Marchant
Ricardo Muñoz Sánchez
Mugdha Pandya
Adam Lopez
126
179
0
31 Dec 2020
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked
  Language Models
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
Nikita Nangia
Clara Vania
Rasika Bhalerao
Samuel R. Bowman
131
685
0
30 Sep 2020
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings
  Contain a Distribution of Human-like Biases
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases
W. Guo
Aylin Caliskan
39
243
0
06 Jun 2020
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
Su Lin Blodgett
Solon Barocas
Hal Daumé
Hanna M. Wallach
157
1,248
0
28 May 2020
StereoSet: Measuring stereotypical bias in pretrained language models
StereoSet: Measuring stereotypical bias in pretrained language models
Moin Nadeem
Anna Bethke
Siva Reddy
101
1,014
0
20 Apr 2020
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive
  Summarization
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Jingqing Zhang
Yao-Min Zhao
Mohammad Saleh
Peter J. Liu
RALM3DGS
297
2,051
0
18 Dec 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMatVLM
264
10,851
0
29 Oct 2019
Measuring Bias in Contextualized Word Representations
Measuring Bias in Contextualized Word Representations
Keita Kurita
Nidhi Vyas
Ayush Pareek
A. Black
Yulia Tsvetkov
106
451
0
18 Jun 2019
Bias in Bios: A Case Study of Semantic Representation Bias in a
  High-Stakes Setting
Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting
Maria De-Arteaga
Alexey Romanov
Hanna M. Wallach
J. Chayes
C. Borgs
Alexandra Chouldechova
S. Geyik
K. Kenthapadi
Adam Tauman Kalai
194
460
0
27 Jan 2019
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
Jieyu Zhao
Tianlu Wang
Mark Yatskar
Vicente Ordonez
Kai-Wei Chang
130
942
0
18 Apr 2018
Semantics derived automatically from language corpora contain human-like
  biases
Semantics derived automatically from language corpora contain human-like biases
Aylin Caliskan
J. Bryson
Arvind Narayanan
215
2,673
0
25 Aug 2016
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word
  Embeddings
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Tolga Bolukbasi
Kai-Wei Chang
James Zou
Venkatesh Saligrama
Adam Kalai
CVBMFaML
112
3,150
0
21 Jul 2016
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
Piotr Bojanowski
Edouard Grave
Armand Joulin
Tomas Mikolov
NAISSLVLM
232
9,980
0
15 Jul 2016
Improved Techniques for Training GANs
Improved Techniques for Training GANs
Tim Salimans
Ian Goodfellow
Wojciech Zaremba
Vicki Cheung
Alec Radford
Xi Chen
GAN
486
9,067
0
10 Jun 2016
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
686
31,544
0
16 Jan 2013
1