ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.06390
  4. Cited By
Detoxifying Language Models Risks Marginalizing Minority Voices

Detoxifying Language Models Risks Marginalizing Minority Voices

13 April 2021
Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
ArXivPDFHTML

Papers citing "Detoxifying Language Models Risks Marginalizing Minority Voices"

50 / 86 papers shown
Title
TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
Gwen Yidou Weng
Benjie Wang
Mathias Niepert
BDL
155
0
0
25 Apr 2025
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
Xiaoxuan Zhu
Zhouhong Gu
Baiqian Wu
Suhang Zheng
Tao Wang
Tianyu Li
Hongwei Feng
Yanghua Xiao
42
0
0
01 Apr 2025
What Are They Filtering Out? A Survey of Filtering Strategies for Harm Reduction in Pretraining Datasets
Marco Antonio Stranisci
Christian Hardmeier
57
0
0
17 Feb 2025
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text
  Generation Framework
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework
Yifan Wang
Vera Demberg
29
0
0
24 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language
  Models
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
33
1
0
17 Oct 2024
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao
Wenxuan Ding
Shangbin Feng
Lucy Lu Wang
Yulia Tsvetkov
32
0
0
14 Oct 2024
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
Fangru Lin
Shaoguang Mao
Emanuele La Malfa
Valentin Hofmann
Adrian de Wynter
Jing Yao
Si-Qing Chen
Michael Wooldridge
Furu Wei
Furu Wei
51
2
0
14 Oct 2024
Large Language Models can be Strong Self-Detoxifiers
Large Language Models can be Strong Self-Detoxifiers
Ching-Yun Ko
Pin-Yu Chen
Payel Das
Youssef Mroueh
Soham Dan
Georgios Kollias
Subhajit Chaudhury
Tejaswini Pedapati
Luca Daniel
34
2
0
04 Oct 2024
Is Generative AI the Next Tactical Cyber Weapon For Threat Actors?
  Unforeseen Implications of AI Generated Cyber Attacks
Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks
Yusuf Usman
Aadesh Upadhyay
P. Gyawali
Robin Chataut
AAML
45
2
0
23 Aug 2024
Randomization Techniques to Mitigate the Risk of Copyright Infringement
Randomization Techniques to Mitigate the Risk of Copyright Infringement
Wei-Ning Chen
Peter Kairouz
Sewoong Oh
Zheng Xu
AAML
40
0
0
21 Aug 2024
Know Your Limits: A Survey of Abstention in Large Language Models
Know Your Limits: A Survey of Abstention in Large Language Models
Bingbing Wen
Jihan Yao
Shangbin Feng
Chenjun Xu
Yulia Tsvetkov
Bill Howe
Lucy Lu Wang
59
11
0
25 Jul 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons
Consent in Crisis: The Rapid Decline of the AI Data Commons
Shayne Longpre
Robert Mahari
Ariel N. Lee
Campbell Lund
Hamidah Oderinwale
...
Hanlin Li
Daphne Ippolito
Sara Hooker
Jad Kabbara
Sandy Pentland
69
36
0
20 Jul 2024
Voices in a Crowd: Searching for Clusters of Unique Perspectives
Voices in a Crowd: Searching for Clusters of Unique Perspectives
Nikolas Vitsakis
Amit Parekh
Ioannis Konstas
44
0
0
19 Jul 2024
The Sociolinguistic Foundations of Language Modeling
The Sociolinguistic Foundations of Language Modeling
Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
A. Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter
41
7
0
12 Jul 2024
Data, Data Everywhere: A Guide for Pretraining Dataset Construction
Data, Data Everywhere: A Guide for Pretraining Dataset Construction
Jupinder Parmar
Shrimai Prabhumoye
Joseph Jennings
Bo Liu
Aastha Jhunjhunwala
Zhilin Wang
M. Patwary
M. Shoeybi
Bryan Catanzaro
53
6
0
08 Jul 2024
Rethinking harmless refusals when fine-tuning foundation models
Rethinking harmless refusals when fine-tuning foundation models
Florin Pop
Judd Rosenblatt
Diogo Schwerz de Lucena
Michael Vaiana
18
0
0
27 Jun 2024
Towards Minimal Targeted Updates of Language Models with Targeted
  Negative Training
Towards Minimal Targeted Updates of Language Models with Targeted Negative Training
Lily H. Zhang
Rajesh Ranganath
Arya Tafvizi
33
1
0
19 Jun 2024
LIDAO: Towards Limited Interventions for Debiasing (Large) Language
  Models
LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models
Tianci Liu
Haoyu Wang
Shiyang Wang
Yu Cheng
Jing Gao
ALM
35
0
0
01 Jun 2024
A Robot Walks into a Bar: Can Language Models Serve as Creativity
  Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with
  Comedians
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
Piotr Wojciech Mirowski
Juliette Love
K. Mathewson
Shakir Mohamed
32
20
0
31 May 2024
Harmful Speech Detection by Language Models Exhibits Gender-Queer
  Dialect Bias
Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias
Rebecca Dorn
Lee Kezar
Fred Morstatter
Kristina Lerman
32
7
0
23 May 2024
Get more for less: Principled Data Selection for Warming Up Fine-Tuning
  in LLMs
Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs
Feiyang Kang
H. Just
Yifan Sun
Himanshu Jahagirdar
Yuanzhi Zhang
Rongxing Du
Anit Kumar Sahu
Ruoxi Jia
56
18
0
05 May 2024
Language Models in Dialogue: Conversational Maxims for Human-AI
  Interactions
Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
Erik Miehling
Manish Nagireddy
P. Sattigeri
Elizabeth M. Daly
David Piorkowski
John T. Richards
ALM
42
11
0
22 Mar 2024
Recourse for reclamation: Chatting with generative language models
Recourse for reclamation: Chatting with generative language models
Jennifer Chien
Kevin R. McKee
Jackie Kay
William S. Isaac
27
0
0
21 Mar 2024
Farsight: Fostering Responsible AI Awareness During AI Application
  Prototyping
Farsight: Fostering Responsible AI Awareness During AI Application Prototyping
Zijie J. Wang
Chinmay Kulkarni
Lauren Wilcox
Michael Terry
Michael A. Madaio
40
43
0
23 Feb 2024
The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support
The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support
Inhwa Song
Sachin R. Pendse
Neha Kumar
Munmun De Choudhury
AI4MH
39
16
0
25 Jan 2024
Contrastive Perplexity for Controlled Generation: An Application in
  Detoxifying Large Language Models
Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models
T. Klein
Moin Nabi
26
1
0
16 Jan 2024
GTA: Gated Toxicity Avoidance for LM Performance Preservation
GTA: Gated Toxicity Avoidance for LM Performance Preservation
Heegyu Kim
Hyunsouk Cho
19
1
0
11 Dec 2023
A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
  Generation
A Block Metropolis-Hastings Sampler for Controllable Energy-based Text Generation
Jarad Forristal
Niloofar Mireshghallah
Greg Durrett
Taylor Berg-Kirkpatrick
118
4
0
07 Dec 2023
Tackling Bias in Pre-trained Language Models: Current Trends and
  Under-represented Societies
Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies
Vithya Yogarajan
Gillian Dobbie
Te Taka Keegan
R. Neuwirth
ALM
43
11
0
03 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
39
6
0
21 Nov 2023
Causal ATE Mitigates Unintended Bias in Controlled Text Generation
Causal ATE Mitigates Unintended Bias in Controlled Text Generation
Rahul Madhavan
Kahini Wadhawan
43
0
0
19 Nov 2023
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing
  & Attribution in AI
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Shayne Longpre
Robert Mahari
Anthony Chen
Naana Obeng-Marnu
Damien Sileo
...
K. Bollacker
Tongshuang Wu
Luis Villa
Sandy Pentland
Sara Hooker
20
56
0
25 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future
  Directions
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
48
42
0
16 Oct 2023
Self-Detoxifying Language Models via Toxification Reversal
Self-Detoxifying Language Models via Toxification Reversal
Chak Tou Leong
Yi Cheng
Jiashuo Wang
Jian Wang
Wenjie Li
MU
24
30
0
14 Oct 2023
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented
  Models
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Luiza Amador Pozzobon
Beyza Ermis
Patrick Lewis
Sara Hooker
36
20
0
11 Oct 2023
Bias and Fairness in Large Language Models: A Survey
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan A. Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
26
490
0
02 Sep 2023
CMD: a framework for Context-aware Model self-Detoxification
CMD: a framework for Context-aware Model self-Detoxification
Zecheng Tang
Keyan Zhou
Juntao Li
Yuyang Ding
Pinzheng Wang
Bowen Yan
Minzhang
MU
23
5
0
16 Aug 2023
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
Jonathan Pei
Kevin Kaichuang Yang
Dan Klein
40
21
0
06 Jul 2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi
Kaixuan Huang
Ashwinee Panda
Peter Henderson
Mengdi Wang
Prateek Mittal
AAML
25
138
0
22 Jun 2023
Evaluating the Social Impact of Generative AI Systems in Systems and
  Society
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Irene Solaiman
Zeerak Talat
William Agnew
Lama Ahmad
Dylan K. Baker
...
Marie-Therese Png
Shubham Singh
A. Strait
Lukas Struppek
Arjun Subramonian
ELM
EGVM
31
104
0
09 Jun 2023
NLPositionality: Characterizing Design Biases of Datasets and Models
NLPositionality: Characterizing Design Biases of Datasets and Models
Sebastin Santy
Jenny T Liang
Ronan Le Bras
Katharina Reinecke
Maarten Sap
32
77
0
02 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute
  Controlled Generation
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan
Rishabh Garg
Kahini Wadhawan
S. Mehta
29
5
0
01 Jun 2023
An Invariant Learning Characterization of Controlled Text Generation
An Invariant Learning Characterization of Controlled Text Generation
Carolina Zheng
Claudia Shi
Keyon Vafa
Amir Feder
David M. Blei
OOD
38
8
0
31 May 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
  Language Model Application
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
38
28
0
28 May 2023
Psychological Metrics for Dialog System Evaluation
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. A. Schwartz
Joao Sedoc
22
2
0
24 May 2023
A Pretrainer's Guide to Training Data: Measuring the Effects of Data
  Age, Domain Coverage, Quality, & Toxicity
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Shayne Longpre
Gregory Yauney
Emily Reif
Katherine Lee
Adam Roberts
...
Denny Zhou
Jason W. Wei
Kevin Robinson
David M. Mimno
Daphne Ippolito
26
149
0
22 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
116
1,148
0
17 May 2023
Beyond the Safeguards: Exploring the Security Risks of ChatGPT
Beyond the Safeguards: Exploring the Security Risks of ChatGPT
Erik Derner
Kristina Batistic
SILM
27
65
0
13 May 2023
Appropriateness is all you need!
Appropriateness is all you need!
Hendrik Kempt
A. Lavie
S. Nagel
28
1
0
27 Apr 2023
A Group-Specific Approach to NLP for Hate Speech Detection
A Group-Specific Approach to NLP for Hate Speech Detection
Karina Halevy
28
1
0
21 Apr 2023
12
Next