Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00453
Cited By
v1
v2 (latest)
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
28 February 2021
Timo Schick
Sahana Udupa
Hinrich Schütze
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP"
50 / 256 papers shown
Title
Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2
Ambri Ma
Arnav Kumar
Brett Zeligson
23
1
0
17 Nov 2023
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Li Xu
Jun Liu
MU
138
0
0
16 Nov 2023
Prompt-based Pseudo-labeling Strategy for Sample-Efficient Semi-Supervised Extractive Summarization
Gaurav Sahu
Olga Vechtomova
I. Laradji
83
1
0
16 Nov 2023
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
144
117
0
11 Nov 2023
All Should Be Equal in the Eyes of Language Models: Counterfactually Aware Fair Text Generation
Pragyan Banerjee
Abhinav Java
Surgan Jandial
Simra Shahid
Shaz Furniturewala
Balaji Krishnamurthy
S. Bhatia
67
3
0
09 Nov 2023
Successor Features for Efficient Multisubject Controlled Text Generation
Mengyao Cao
Mehdi Fatemi
Jackie Chi Kit Cheung
Samira Shabanian
BDL
89
0
0
03 Nov 2023
LLMaAA: Making Large Language Models as Active Annotators
Ruoyu Zhang
Yanzeng Li
Yongliang Ma
Ming Zhou
Lei Zou
109
74
0
30 Oct 2023
Unpacking the Ethical Value Alignment in Big Models
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
82
13
0
26 Oct 2023
Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Preethi Lahoti
Nicholas Blumm
Xiao Ma
Raghavendra Kotikalapudi
Sahitya Potluri
...
Hansa Srinivasan
Ben Packer
Ahmad Beirami
Alex Beutel
Jilin Chen
114
32
0
25 Oct 2023
A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models
Yuanfeng Song
Yuanqin He
Xuefang Zhao
Hanlin Gu
Di Jiang
Haijun Yang
Lixin Fan
Qiang Yang
76
6
0
24 Oct 2023
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation
Gaurav Sahu
Olga Vechtomova
Dzmitry Bahdanau
I. Laradji
VLM
112
27
0
22 Oct 2023
StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large Language Models
Sullam Jeoung
Yubin Ge
Jana Diesner
73
5
0
20 Oct 2023
A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models
Yi Zhou
Jose Camacho-Collados
Danushka Bollegala
164
6
0
19 Oct 2023
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
Abhijith Chintam
Rahel Beloch
Willem H. Zuidema
Michael Hanna
Oskar van der Wal
89
18
0
19 Oct 2023
Fast Model Debias with Machine Unlearning
Ruizhe Chen
Jianfei Yang
Huimin Xiong
Jianhong Bai
Tianxiang Hu
Jinxiang Hao
Yang Feng
Qiufeng Wang
Jian Wu
Zuo-Qiang Liu
MU
123
69
0
19 Oct 2023
Co
2
^2
2
PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning
Xiangjue Dong
Ziwei Zhu
Zhuoer Wang
Maria Teleki
James Caverlee
111
11
0
19 Oct 2023
A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation
Giuseppe Attanasio
Flor Miriam Plaza del Arco
Debora Nozza
Anne Lauscher
72
19
0
18 Oct 2023
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
119
362
0
17 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
118
45
0
16 Oct 2023
Self-Detoxifying Language Models via Toxification Reversal
Chak Tou Leong
Yi Cheng
Jiashuo Wang
Jian Wang
Wenjie Li
MU
52
38
0
14 Oct 2023
"Im not Racist but...": Discovering Bias in the Internal Knowledge of Large Language Models
A. Salinas
Louis Penafiel
Robert McCormack
Fred Morstatter
52
5
0
13 Oct 2023
Unlocking Bias Detection: Leveraging Transformer-Based Models for Content Analysis
Shaina Raza
Oluwanifemi Bamgbose
Veronica Chatrath
Shardul Ghuge
Yan Sidyakin
Abdullah Y. Muaad
89
13
0
30 Sep 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
115
207
0
26 Sep 2023
Watch Your Language: Investigating Content Moderation with Large Language Models
Deepak Kumar
Y. AbuHashem
Zakir Durumeric
AI4MH
99
19
0
25 Sep 2023
Survey of Social Bias in Vision-Language Models
Nayeon Lee
Yejin Bang
Holy Lovenia
Samuel Cahyawijaya
Wenliang Dai
Pascale Fung
VLM
132
19
0
24 Sep 2023
Learning by Self-Explaining
Wolfgang Stammer
Felix Friedrich
David Steinmann
Manuel Brack
Hikaru Shindo
Kristian Kersting
138
12
0
15 Sep 2023
In-Contextual Gender Bias Suppression for Large Language Models
Daisuke Oba
Masahiro Kaneko
Danushka Bollegala
88
9
0
13 Sep 2023
Detecting Natural Language Biases with Prompt-based Learning
Md Abdul Aowal
Maliha T Islam
P. Mammen
Sandesh Shetty
66
1
0
11 Sep 2023
OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs
Patrick Haller
Ansar Aynetdinov
Alan Akbik
81
26
0
07 Sep 2023
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
140
612
0
02 Sep 2023
CMD: a framework for Context-aware Model self-Detoxification
Zecheng Tang
Keyan Zhou
Juntao Li
Yuyang Ding
Pinzheng Wang
Bowen Yan
Minzhang
MU
65
5
0
16 Aug 2023
Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes
Fabian Galetzka
Anne Beyer
David Schlangen
AI4CE
86
1
0
11 Aug 2023
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content
Xinlei He
Savvas Zannettou
Yun Shen
Yang Zhang
CLL
53
43
0
10 Aug 2023
XNLP: An Interactive Demonstration System for Universal Structured NLP
Hao Fei
Meishan Zhang
Hao Fei
Tat-Seng Chua
95
1
0
03 Aug 2023
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
Mingyuan Fan
Chengyu Wang
Cen Chen
Yang Liu
Jun Huang
HILM
98
3
0
31 Jul 2023
Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models
Somayeh Ghanbarzadeh
Yan-ping Huang
Hamid Palangi
R. C. Moreno
Hamed Khanpour
78
12
0
20 Jul 2023
Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
Hui Yuan
Kaixuan Huang
Chengzhuo Ni
Minshuo Chen
Mengdi Wang
DiffM
94
37
0
13 Jul 2023
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models
Sanghyun Kim
Seohyeong Jung
Balhae Kim
Moonseok Choi
Jinwoo Shin
Juho Lee
DiffM
64
30
0
12 Jul 2023
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
Jonathan Pei
Kevin Kaichuang Yang
Dan Klein
125
21
0
06 Jul 2023
On Evaluating and Mitigating Gender Biases in Multilingual Settings
Aniket Vashishtha
Kabir Ahuja
Sunayana Sitaram
95
26
0
04 Jul 2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi
Kaixuan Huang
Ashwinee Panda
Peter Henderson
Mengdi Wang
Prateek Mittal
AAML
131
173
0
22 Jun 2023
Sociodemographic Bias in Language Models: A Survey and Forward Path
Vipul Gupta
Pranav Narayanan Venkit
Shomir Wilson
R. Passonneau
97
23
0
13 Jun 2023
AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks
Alexander Tornede
Difan Deng
Theresa Eimer
Joseph Giovanelli
Aditya Mohan
...
Sarah Segel
Daphne Theodorakopoulos
Tanja Tornede
Henning Wachsmuth
Marius Lindauer
119
24
0
13 Jun 2023
Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions
Himanshu Thakur
Atishay Jain
Praneetha Vaddamanu
Paul Pu Liang
Louis-Philippe Morency
111
39
0
07 Jun 2023
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models
Zhongbin Xie
Thomas Lukasiewicz
65
13
0
06 Jun 2023
Structured Voronoi Sampling
Afra Amini
Li Du
Ryan Cotterell
DiffM
101
2
0
05 Jun 2023
PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models
Hao Li
Yuping Wu
Viktor Schlegel
Riza Batista-Navarro
Thanh-Tung Nguyen
Abhinav Ramesh Kashyap
Xiaojun Zeng
Daniel Beck
Stefan Winkler
Goran Nenadic
LM&MA
85
9
0
05 Jun 2023
An Invariant Learning Characterization of Controlled Text Generation
Carolina Zheng
Claudia Shi
Keyon Vafa
Amir Feder
David M. Blei
OOD
103
8
0
31 May 2023
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
Myra Cheng
Esin Durmus
Dan Jurafsky
75
204
0
29 May 2023
A Practical Toolkit for Multilingual Question and Answer Generation
Asahi Ushio
Fernando Alva-Manchego
Jose Camacho-Collados
SyDa
85
14
0
27 May 2023
Previous
1
2
3
4
5
6
Next