Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.03689
Cited By
Evaluating and Mitigating Discrimination in Language Model Decisions
6 December 2023
Alex Tamkin
Amanda Askell
Liane Lovitt
Esin Durmus
Nicholas Joseph
Shauna Kravec
Karina Nguyen
Jared Kaplan
Deep Ganguli
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluating and Mitigating Discrimination in Language Model Decisions"
18 / 18 papers shown
Title
GenderBench: Evaluation Suite for Gender Biases in LLMs
Matúš Pikuliak
79
0
0
17 May 2025
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Judy Hanwen Shen
Carlos Guestrin
185
1
0
09 Apr 2025
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs
Angelina Wang
Michelle Phan
Daniel E. Ho
Sanmi Koyejo
115
2
0
04 Feb 2025
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Chaoqi Wang
Zhuokai Zhao
Yibo Jiang
Zhaorun Chen
Chen Zhu
...
Jiayi Liu
Lizhu Zhang
Xiangjun Fan
Hao Ma
Sinong Wang
136
5
0
16 Jan 2025
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Roberto-Rafael Maura-Rivero
Chirag Nagpal
Roma Patel
Francesco Visin
108
1
0
08 Jan 2025
Collapsed Language Models Promote Fairness
Jingxuan Xu
Wuyang Chen
Linyi Li
Yao Zhao
Yunchao Wei
97
0
0
06 Oct 2024
Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios
Vishal Mirza
Rahul Kulkarni
Aakanksha Jadhav
108
2
0
22 Sep 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
156
39
0
08 Apr 2024
Eliciting Human Preferences with Language Models
Belinda Z. Li
Alex Tamkin
Noah D. Goodman
Jacob Andreas
RALM
80
51
0
17 Oct 2023
Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?
Rishi Bommasani
Kathleen A. Creel
Ananya Kumar
Dan Jurafsky
Percy Liang
65
86
0
25 Nov 2022
The Fallacy of AI Functionality
Inioluwa Deborah Raji
Indra Elizabeth Kumar
Aaron Horowitz
Andrew D. Selbst
71
188
0
20 Jun 2022
Self-critiquing models for assisting human evaluators
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
ALM
ELM
109
306
0
12 Jun 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
180
667
0
07 Feb 2022
Surface Form Competition: Why the Highest Probability Answer Isn't Always Right
Ari Holtzman
Peter West
Vered Schwartz
Yejin Choi
Luke Zettlemoyer
LRM
108
239
0
16 Apr 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
313
387
0
28 Feb 2021
Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits
Jack Bandy
MLAU
58
112
0
03 Feb 2021
Avoiding Discrimination through Causal Reasoning
Niki Kilbertus
Mateo Rojas-Carulla
Giambattista Parascandolo
Moritz Hardt
Dominik Janzing
Bernhard Schölkopf
FaML
CML
115
584
0
08 Jun 2017
Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination
Amit Datta
Michael Carl Tschantz
Anupam Datta
79
735
0
27 Aug 2014
1