Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.13219
Cited By
Towards Understanding and Mitigating Social Biases in Language Models
24 June 2021
Paul Pu Liang
Chiyu Wu
Louis-Philippe Morency
Ruslan Salakhutdinov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Understanding and Mitigating Social Biases in Language Models"
50 / 51 papers shown
Title
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction
Mohammadtaha Bagherifard
Sahar Rajabi
Ali Edalat
Yadollah Yaghoobzadeh
KELM
17
0
0
16 May 2025
An AI-Powered Research Assistant in the Lab: A Practical Guide for Text Analysis Through Iterative Collaboration with LLMs
Gino Carmona-Díaz
William Jiménez-Leal
María Alejandra Grisales
Chandra Sripada
Santiago Amaya
Michael Inzlicht
Juan Pablo Bermúdez
24
0
0
14 May 2025
AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How
Omid Veisi
Sasan Bahrami
Roman Englert
Claudia Müller
120
0
0
25 Apr 2025
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Rijul Magu
Arka Dutta
Sean Kim
Ashiqur R. KhudaBukhsh
Munmun De Choudhury
21
0
0
08 Apr 2025
Fair Text Classification via Transferable Representations
Thibaud Leteno
Michael Perrot
Charlotte Laclau
Antoine Gourru
Christophe Gravier
FaML
88
0
0
10 Mar 2025
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models
Panatchakorn Anantaprayoon
Masahiro Kaneko
Naoaki Okazaki
LRM
KELM
50
0
0
08 Mar 2025
AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County
Faiz Surani
Mirac Suzgun
Vyoma Raman
Christopher D. Manning
Peter Henderson
Daniel E. Ho
41
0
0
12 Feb 2025
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs
Angelina Wang
Michelle Phan
Daniel E. Ho
Sanmi Koyejo
51
2
0
04 Feb 2025
Enhancing Privacy in the Early Detection of Sexual Predators Through Federated Learning and Differential Privacy
Khaoula Chehbouni
Martine De Cock
Gilles Caporossi
Afaf Taik
Reihaneh Rabbany
G. Farnadi
73
0
0
21 Jan 2025
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals
Qingyang Wu
Ying Xu
Tingsong Xiao
Yunze Xiao
Yitong Li
...
Yichi Zhang
Shanghai Zhong
Yuwei Zhang
Wei Lu
Yifan Yang
78
2
0
17 Jan 2025
The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection
Tomas Horych
Christoph Mandl
Terry Ruas
André Greiner-Petter
Bela Gipp
Akiko Aizawa
Timo Spinde
96
4
0
17 Nov 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
38
0
06 Jun 2024
Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
Guangliang Liu
Milad Afshari
Xitong Zhang
Zhiyu Xue
Avrajit Ghosh
Bidhan Bashyal
Rongrong Wang
K. Johnson
27
0
0
06 Jun 2024
Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models
Jisu Shin
Hoyun Song
Huije Lee
Soyeong Jeong
Jong C. Park
38
6
0
06 Jun 2024
Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models
Paula Akemi Aoyagui
Sharon Ferguson
Anastasia Kuzminykh
50
0
0
17 May 2024
Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes
Damin Zhang
Yi Zhang
Geetanjali Bihani
Julia Taylor Rayz
53
2
0
06 May 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
58
31
0
08 Apr 2024
The Impact of Unstated Norms in Bias Analysis of Language Models
Farnaz Kohankhaki
D. B. Emerson
David B. Emerson
Laleh Seyyed-Kalantari
Faiza Khan Khattak
57
1
0
04 Apr 2024
An Early Categorization of Prompt Injection Attacks on Large Language Models
Sippo Rossi
Alisia Marianne Michel
R. Mukkamala
J. Thatcher
SILM
AAML
26
16
0
31 Jan 2024
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting
Masahiro Kaneko
Danushka Bollegala
Naoaki Okazaki
Timothy Baldwin
LRM
34
27
0
28 Jan 2024
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas L. Griffiths
LLMAG
LM&Ro
48
152
0
05 Sep 2023
FairMonitor: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models
Yanhong Bai
Jiabao Zhao
Jinxin Shi
Tingjiang Wei
Xingjiao Wu
Liangbo He
36
0
0
21 Aug 2023
AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models
Siheng Li
Cheng Yang
Yichun Yin
Xinyu Zhu
Ze-Long Cheng
Lifeng Shang
Xin Jiang
Qun Liu
Yujiu Yang
SyDa
27
3
0
12 Aug 2023
Opinion Mining Using Population-tuned Generative Language Models
Allmin Pradhap Singh Susaiyah
Abhinay Pandya
Aki Härmä
15
0
0
24 Jul 2023
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
58
119
0
01 Jul 2023
A Trip Towards Fairness: Bias and De-Biasing in Large Language Models
Leonardo Ranaldi
Elena Sofia Ruzzetti
Davide Venditti
Dario Onorati
Fabio Massimo Zanzotto
32
34
0
23 May 2023
ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages
Sourojit Ghosh
Aylin Caliskan
33
69
0
17 May 2023
Entity-Based Evaluation of Political Bias in Automatic Summarization
Karen Zhou
Chenhao Tan
35
1
0
03 May 2023
DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach
Bilal Ghanem
Alona Fyshe
24
4
0
10 Apr 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
29
506
0
07 Mar 2023
Geographic and Geopolitical Biases of Language Models
Fahim Faisal
Antonios Anastasopoulos
20
20
0
20 Dec 2022
Manifestations of Xenophobia in AI Systems
Nenad Tomašev
J. L. Maynard
Iason Gabriel
24
9
0
15 Dec 2022
A Simple, Yet Effective Approach to Finding Biases in Code Generation
Spyridon Mouselinos
Mateusz Malinowski
Henryk Michalewski
10
7
0
31 Oct 2022
Mutual Information Alleviates Hallucinations in Abstractive Summarization
Liam van der Poel
Ryan Cotterell
Clara Meister
HILM
13
56
0
24 Oct 2022
COFFEE: Counterfactual Fairness for Personalized Text Generation in Explainable Recommendation
Nan Wang
Qifan Wang
Yi-Chia Wang
Maziar Sanjabi
Jingzhou Liu
Hamed Firooz
Hongning Wang
Shaoliang Nie
28
6
0
14 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
444
0
23 Aug 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gökhan Tür
Premkumar Natarajan
51
82
0
02 Aug 2022
Probing via Prompting
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
29
13
0
04 Jul 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
25
49
0
16 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
17
50
0
01 Jun 2022
Learning Disentangled Textual Representations via Statistical Measures of Similarity
Pierre Colombo
Guillaume Staerman
Nathan Noiry
Pablo Piantanida
FaML
DRL
38
22
0
07 May 2022
Contrastive Learning for Prompt-Based Few-Shot Language Learners
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
18
32
0
03 May 2022
Embedding Hallucination for Few-Shot Language Fine-tuning
Yiren Jian
Chongyang Gao
Soroush Vosoughi
23
4
0
03 May 2022
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
54
3,488
0
02 May 2022
Detoxifying Language Models with a Toxic Corpus
Yoon A Park
Frank Rudzicz
19
6
0
30 Apr 2022
QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance
Xiaoqiang Wang
Bang Liu
Siliang Tang
Lingfei Wu
30
9
0
29 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in Pretrained Language Models
Pieter Delobelle
E. Tokpo
T. Calders
Bettina Berendt
19
25
0
14 Dec 2021
Latent Space Smoothing for Individually Fair Representations
Momchil Peychev
Anian Ruoss
Mislav Balunović
Maximilian Baader
Martin Vechev
FaML
36
19
0
26 Nov 2021
An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models
Nicholas Meade
Elinor Poole-Dayan
Siva Reddy
22
123
0
16 Oct 2021
1
2
Next