Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.06390
Cited By
Detoxifying Language Models Risks Marginalizing Minority Voices
13 April 2021
Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Detoxifying Language Models Risks Marginalizing Minority Voices"
36 / 86 papers shown
Title
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
76
786
0
30 Mar 2023
Automatically Auditing Large Language Models via Discrete Optimization
Erik Jones
Anca Dragan
Aditi Raghunathan
Jacob Steinhardt
48
158
0
08 Mar 2023
A Pathway Towards Responsible AI Generated Content
Chen Chen
Jie Fu
Lingjuan Lyu
49
71
0
02 Mar 2023
Systematic Rectification of Language Models via Dead-end Analysis
Mengyao Cao
Mehdi Fatemi
Jackie C.K. Cheung
Samira Shabanian
KELM
32
15
0
27 Feb 2023
Pretraining Language Models with Human Preferences
Tomasz Korbak
Kejian Shi
Angelica Chen
Rasika Bhalerao
C. L. Buckley
Jason Phang
Sam Bowman
Ethan Perez
ALM
SyDa
36
207
0
16 Feb 2023
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Shrimai Prabhumoye
M. Patwary
M. Shoeybi
Bryan Catanzaro
LM&MA
30
19
0
14 Feb 2023
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions
Marwan Omar
SILM
AAML
33
20
0
14 Feb 2023
The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation
Jochen Hartmann
Jasper Schwenzow
Maximilian Witte
22
203
0
05 Jan 2023
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts
Skyler Hallinan
Alisa Liu
Yejin Choi
Maarten Sap
22
36
0
20 Dec 2022
Controllable Text Generation with Language Constraints
Howard Chen
Huihan Li
Danqi Chen
Karthik Narasimhan
14
16
0
20 Dec 2022
Robosourcing Educational Resources -- Leveraging Large Language Models for Learnersourcing
Paul Denny
Sami Sarsa
Arto Hellas
Juho Leinonen
AI4Ed
10
35
0
09 Nov 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
73
2,999
0
20 Oct 2022
Language Detoxification with Attribute-Discriminative Latent Space
Jin Myung Kwak
Minseon Kim
Sung Ju Hwang
33
12
0
19 Oct 2022
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
77
85
0
14 Oct 2022
The State of Profanity Obfuscation in Natural Language Processing
Debora Nozza
Dirk Hovy
42
7
0
14 Oct 2022
Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization
Zonghan Yang
Xiaoyuan Yi
Peng Li
Yang Liu
Xing Xie
33
33
0
10 Oct 2022
Toxicity in Multilingual Machine Translation at Scale
Marta R. Costa-jussá
Eric Michael Smith
C. Ropers
Daniel Licht
Jean Maillard
Javier Ferrando
Carlos Escolano
30
25
0
06 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
506
0
28 Sep 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
30
49
0
16 Jun 2022
Towards a Deep Multi-layered Dialectal Language Analysis: A Case Study of African-American English
Jamell Dacon
21
5
0
03 Jun 2022
Metaethical Perspectives on 'Benchmarking' AI Ethics
Travis LaCroix
A. Luccioni
27
7
0
11 Apr 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
42
1,846
0
29 Mar 2022
Mix and Match: Learning-free Controllable Text Generation using Energy Language Models
Fatemehsadat Mireshghallah
Kartik Goyal
Taylor Berg-Kirkpatrick
36
78
0
24 Mar 2022
Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal
Umang Gupta
Jwala Dhamala
Varun Kumar
Apurv Verma
Yada Pruksachatkun
Satyapriya Krishna
Rahul Gupta
Kai-Wei Chang
Greg Ver Steeg
Aram Galstyan
18
50
0
23 Mar 2022
Suum Cuique: Studying Bias in Taboo Detection with a Community Perspective
Osama Khalid
Jonathan Rusert
P. Srinivasan
11
1
0
22 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
33
350
0
17 Mar 2022
Leashing the Inner Demons: Self-Detoxification for Language Models
Canwen Xu
Zexue He
Zhankui He
Julian McAuley
20
26
0
06 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Reward Modeling for Mitigating Toxicity in Transformer-based Language Models
Farshid Faal
K. Schmitt
Jia Yuan Yu
13
25
0
19 Feb 2022
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Wei Ping
Ming-Yu Liu
Chaowei Xiao
P. Xu
M. Patwary
M. Shoeybi
Bo-wen Li
Anima Anandkumar
Bryan Catanzaro
25
64
0
08 Feb 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
13
610
0
07 Feb 2022
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
35
977
0
08 Dec 2021
Time Waits for No One! Analysis and Challenges of Temporal Misalignment
Kelvin Luu
Daniel Khashabi
Suchin Gururangan
Karishma Mandyam
Noah A. Smith
27
85
0
14 Nov 2021
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
250
193
0
15 Sep 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
43
105
0
07 Jul 2021
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
Irene Solaiman
Christy Dennison
30
222
0
18 Jun 2021
Previous
1
2