ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.03689
  4. Cited By
Evaluating and Mitigating Discrimination in Language Model Decisions

Evaluating and Mitigating Discrimination in Language Model Decisions

6 December 2023
Alex Tamkin
Amanda Askell
Liane Lovitt
Esin Durmus
Nicholas Joseph
Shauna Kravec
Karina Nguyen
Jared Kaplan
Deep Ganguli
ArXiv (abs)PDFHTML

Papers citing "Evaluating and Mitigating Discrimination in Language Model Decisions"

18 / 18 papers shown
Title
GenderBench: Evaluation Suite for Gender Biases in LLMs
GenderBench: Evaluation Suite for Gender Biases in LLMs
Matúš Pikuliak
79
0
0
17 May 2025
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Judy Hanwen Shen
Carlos Guestrin
185
1
0
09 Apr 2025
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs
Angelina Wang
Michelle Phan
Daniel E. Ho
Sanmi Koyejo
115
2
0
04 Feb 2025
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Chaoqi Wang
Zhuokai Zhao
Yibo Jiang
Zhaorun Chen
Chen Zhu
...
Jiayi Liu
Lizhu Zhang
Xiangjun Fan
Hao Ma
Sinong Wang
136
5
0
16 Jan 2025
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Roberto-Rafael Maura-Rivero
Chirag Nagpal
Roma Patel
Francesco Visin
108
1
0
08 Jan 2025
Collapsed Language Models Promote Fairness
Collapsed Language Models Promote Fairness
Jingxuan Xu
Wuyang Chen
Linyi Li
Yao Zhao
Yunchao Wei
97
0
0
06 Oct 2024
Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios
Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios
Vishal Mirza
Rahul Kulkarni
Aakanksha Jadhav
108
2
0
22 Sep 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELMKELM
156
39
0
08 Apr 2024
Eliciting Human Preferences with Language Models
Eliciting Human Preferences with Language Models
Belinda Z. Li
Alex Tamkin
Noah D. Goodman
Jacob Andreas
RALM
80
51
0
17 Oct 2023
Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome
  Homogenization?
Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?
Rishi Bommasani
Kathleen A. Creel
Ananya Kumar
Dan Jurafsky
Percy Liang
65
86
0
25 Nov 2022
The Fallacy of AI Functionality
The Fallacy of AI Functionality
Inioluwa Deborah Raji
Indra Elizabeth Kumar
Aaron Horowitz
Andrew D. Selbst
71
188
0
20 Jun 2022
Self-critiquing models for assisting human evaluators
Self-critiquing models for assisting human evaluators
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
ALMELM
109
306
0
12 Jun 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
180
667
0
07 Feb 2022
Surface Form Competition: Why the Highest Probability Answer Isn't
  Always Right
Surface Form Competition: Why the Highest Probability Answer Isn't Always Right
Ari Holtzman
Peter West
Vered Schwartz
Yejin Choi
Luke Zettlemoyer
LRM
108
239
0
16 Apr 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
  Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
313
387
0
28 Feb 2021
Problematic Machine Behavior: A Systematic Literature Review of
  Algorithm Audits
Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits
Jack Bandy
MLAU
58
112
0
03 Feb 2021
Avoiding Discrimination through Causal Reasoning
Avoiding Discrimination through Causal Reasoning
Niki Kilbertus
Mateo Rojas-Carulla
Giambattista Parascandolo
Moritz Hardt
Dominik Janzing
Bernhard Schölkopf
FaMLCML
115
584
0
08 Jun 2017
Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice,
  and Discrimination
Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination
Amit Datta
Michael Carl Tschantz
Anupam Datta
79
735
0
27 Aug 2014
1