Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00453
Cited By
v1
v2 (latest)
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
28 February 2021
Timo Schick
Sahana Udupa
Hinrich Schütze
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP"
50 / 256 papers shown
Title
Syntactic Control of Language Models by Posterior Inference
Vicky Xefteri
Tim Vieira
Ryan Cotterell
Afra Amini
17
0
0
08 Jun 2025
Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model
Yuanhe Tian
Mingjie Deng
Guoqing Jin
Yan Song
MU
KELM
63
0
0
02 Jun 2025
Something Just Like TRuST : Toxicity Recognition of Span and Target
Berk Atil
Namrata Sureddy
R. Passonneau
33
0
0
02 Jun 2025
Paying Alignment Tax with Contrastive Learning
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Antonio del Rio Chanona
85
0
0
25 May 2025
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing
Zhouhao Sun
Zhiyuan Kan
Xiao Ding
Li Du
Yang Zhao
Bing Qin
Ting Liu
121
0
0
22 May 2025
Relative Bias: A Comparative Framework for Quantifying Bias in LLMs
Alireza Arbabi
Florian Kerschbaum
209
0
0
22 May 2025
Semantic Probabilistic Control of Language Models
Kareem Ahmed
Catarina G Belém
Padhraic Smyth
Sameer Singh
119
1
0
04 May 2025
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Chaomeng Chen
Zitong Yu
Jin Song Dong
Sen Su
Linlin Shen
Shutao Xia
Xiaochun Cao
FedML
VLM
466
0
0
03 May 2025
Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification
Takuma Udagawa
Yang Zhao
H. Kanayama
Bishwaranjan Bhattacharjee
65
0
0
19 Apr 2025
Information Gain-Guided Causal Intervention for Autonomous Debiasing Large Language Models
Zhouhao Sun
Xiao Ding
Li Du
Yunpeng Xu
Yixuan Ma
Yang Zhao
Bing Qin
Ting Liu
97
0
0
17 Apr 2025
Bias Beyond English: Evaluating Social Bias and Debiasing Methods in a Low-Resource Setting
Ej Zhou
Weiming Lu
66
0
0
15 Apr 2025
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Sanchit Kabra
Akshita Jha
Chandan K. Reddy
LRM
174
1
0
08 Apr 2025
Cognitive Debiasing Large Language Models for Decision-Making
Yougang Lyu
Shijie Ren
Yue Feng
Zihan Wang
Zhongfu Chen
Zhaochun Ren
Maarten de Rijke
270
0
0
05 Apr 2025
DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models
Suyoung Bae
YunSeok Choi
Jee-Hyong Lee
73
0
0
25 Mar 2025
Through the LLM Looking Glass: A Socratic Probing of Donkeys, Elephants, and Markets
Molly Kennedy
Ayyoob Imani
Timo Spinde
Hinrich Schütze
100
1
0
20 Mar 2025
Augmented Adversarial Trigger Learning
Zhe Wang
Yanjun Qi
96
0
0
16 Mar 2025
Palette of Language Models: A Solver for Controlled Text Generation
Zhe Yang
Yi Huang
Yaqin Chen
Xiaoting Wu
Junlan Feng
Chao Deng
84
0
0
14 Mar 2025
Rethinking Prompt-based Debiasing in Large Language Models
Xinyi Yang
Runzhe Zhan
Derek F. Wong
Shu Yang
Junchao Wu
Lidia S. Chao
ALM
181
1
0
12 Mar 2025
BiasEdit: Debiasing Stereotyped Language Models via Model Editing
Xin Xu
Wei Xu
N. Zhang
Julian McAuley
KELM
134
1
0
11 Mar 2025
Gender Encoding Patterns in Pretrained Language Model Representations
Mahdi Zakizadeh
Mohammad Taher Pilehvar
216
0
0
09 Mar 2025
Red Team Diffuser: Exposing Toxic Continuation Vulnerabilities in Vision-Language Models via Reinforcement Learning
Ruofan Wang
Xiang Zheng
Xinyu Wang
Cong Wang
Jie Zhang
VLM
73
0
0
08 Mar 2025
Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection
Maximilian Spliethover
Tim Knebler
Fabian Fumagalli
Maximilian Muschalik
Barbara Hammer
Eyke Hüllermeier
Henning Wachsmuth
194
1
0
10 Feb 2025
Addressing Bias in Generative AI: Challenges and Research Opportunities in Information Management
Xiahua Wei
Naveen Kumar
Han Zhang
135
8
0
22 Jan 2025
Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Evangelos Anagnostopoulos
Christos Varytimidis
Antonio del Rio Chanona
80
0
0
13 Jan 2025
Brain Ageing Prediction using Isolation Forest Technique and Residual Neural Network (ResNet)
Saadat Behzadi
Danial Sharifrazi
R. Alizadehsani
Mojtaba Lotfaliany
Mohammadreza Mohebbi
74
0
0
26 Dec 2024
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach
Daiki Shirafuji
Makoto Takenaka
Shinya Taguchi
LLMAG
130
1
0
16 Dec 2024
Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models
S. Tong
Eliott Zemour
Rawisara Lohanimit
Lalana Kagal
104
0
0
02 Dec 2024
How far can bias go? -- Tracing bias from pretraining data to alignment
Marion Thaler
Abdullatif Köksal
Alina Leidinger
Anna Korhonen
Hinrich Schutze
157
1
0
28 Nov 2024
Joint Vision-Language Social Bias Removal for CLIP
Haoyu Zhang
Yangyang Guo
Mohan S. Kankanhalli
VLM
199
1
0
19 Nov 2024
Bias in Large Language Models: Origin, Evaluation, and Mitigation
Yufei Guo
Muzhe Guo
Juntao Su
Zhou Yang
Mengqiu Zhu
Hongfei Li
Mengyang Qiu
Shuo Shuo Liu
AILaw
113
22
0
16 Nov 2024
Smaller Large Language Models Can Do Moral Self-Correction
Guangliang Liu
Zhiyu Xue
Rongrong Wang
K. Johnson
Kristen Marie Johnson
LRM
107
0
0
30 Oct 2024
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework
Yifan Wang
Vera Demberg
72
1
0
24 Oct 2024
CogSteer: Cognition-Inspired Selective Layer Intervention for Efficiently Steering Large Language Models
Xintong Wang
Jingheng Pan
Longqin Jiang
Liang Ding
Longqin Jiang
Xingshan Li
Chris Biemann
LLMSV
92
0
0
23 Oct 2024
A Novel Interpretability Metric for Explaining Bias in Language Models: Applications on Multilingual Models from Southeast Asia
Lance Calvin Lim Gamboa
Mark Lee
67
1
0
20 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
78
1
0
17 Oct 2024
Persona Knowledge-Aligned Prompt Tuning Method for Online Debate
Chunkit Chan
Cheng Jiayang
Xin Liu
Yauwai Yim
Yuxin Jiang
Zheye Deng
Haoran Li
Yangqiu Song
Ginny Wong
Simon See
114
0
0
05 Oct 2024
Large Language Models can be Strong Self-Detoxifiers
Ching-Yun Ko
Pin-Yu Chen
Payel Das
Youssef Mroueh
Soham Dan
Georgios Kollias
Subhajit Chaudhury
Tejaswini Pedapati
Luca Daniel
73
3
0
04 Oct 2024
REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning
Rameez Qureshi
Naim Es-Sebbani
Luis Galárraga
Yvette Graham
Miguel Couceiro
Zied Bouraoui
66
1
0
18 Aug 2024
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
Do Xuan Long
Hai Nguyen Ngoc
Tiviatis Sim
Hieu Dao
Shafiq Joty
Kenji Kawaguchi
Nancy F. Chen
Min-Yen Kan
135
11
0
16 Aug 2024
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Hila Gonen
Terra Blevins
Alisa Liu
Luke Zettlemoyer
Noah A. Smith
145
5
0
12 Aug 2024
Prompt and Prejudice
Lorenzo Berlincioni
Luca Cultrera
Federico Becattini
Marco Bertini
A. Bimbo
73
0
0
07 Aug 2024
Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation
Huimin Lu
Masaru Isonuma
Junichiro Mori
Ichiro Sakata
MU
74
1
0
24 Jul 2024
Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Guang-Da Liu
Haitao Mao
Jiliang Tang
K. Johnson
LRM
97
8
0
21 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
132
7
0
11 Jul 2024
ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
Wai Man Si
Michael Backes
Yang Zhang
78
1
0
09 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
102
19
0
06 Jul 2024
Social Bias Evaluation for Large Language Models Requires Prompt Variations
Rem Hida
Masahiro Kaneko
Naoaki Okazaki
116
20
0
03 Jul 2024
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives
Luísa Shimabucoro
Sebastian Ruder
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
SyDa
74
5
0
01 Jul 2024
Fairness and Bias in Multimodal AI: A Survey
Tosin Adewumi
Lama Alkhaled
Namrata Gurung
G. V. Boven
Irene Pagliai
119
10
0
27 Jun 2024
MD tree: a model-diagnostic tree grown on loss landscape
Yefan Zhou
Jianlong Chen
Qinxue Cao
Konstantin Schürholt
Yaoqing Yang
102
2
0
24 Jun 2024
1
2
3
4
5
6
Next