Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.03072
Cited By
Leashing the Inner Demons: Self-Detoxification for Language Models
6 March 2022
Canwen Xu
Zexue He
Zhankui He
Julian McAuley
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Leashing the Inner Demons: Self-Detoxification for Language Models"
8 / 8 papers shown
Title
Combating Toxic Language: A Review of LLM-Based Strategies for Software Engineering
Hao Zhuo
Yicheng Yang
Kewen Peng
30
0
0
21 Apr 2025
LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots
Dongge Han
Trevor A. McInroe
Adam Jelley
Stefano V. Albrecht
Peter Bell
Amos Storkey
61
11
0
31 Dec 2024
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts
Caroline Brun
Vassilina Nikoulina
40
1
0
25 Jun 2024
CMD: a framework for Context-aware Model self-Detoxification
Zecheng Tang
Keyan Zhou
Juntao Li
Yuyang Ding
Pinzheng Wang
Bowen Yan
Minzhang
MU
25
5
0
16 Aug 2023
Synthetic Pre-Training Tasks for Neural Machine Translation
Zexue He
Graeme W. Blackwood
Yikang Shen
Julian McAuley
Rogerio Feris
29
3
0
19 Dec 2022
Controlling Bias Exposure for Fair Interpretable Predictions
Zexue He
Yu Wang
Julian McAuley
Bodhisattwa Prasad Majumder
27
19
0
14 Oct 2022
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
Waiman Si
Michael Backes
Jeremy Blackburn
Emiliano De Cristofaro
Gianluca Stringhini
Savvas Zannettou
Yang Zhang
36
58
0
07 Sep 2022
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
225
621
0
03 Sep 2019
1