Leashing the Inner Demons: Self-Detoxification for Language Models

6 March 2022

Papers citing "Leashing the Inner Demons: Self-Detoxification for Language Models"

8 / 8 papers shown

Title
Combating Toxic Language: A Review of LLM-Based Strategies for Software Engineering Hao Zhuo Yicheng Yang Kewen Peng 30 0 0 21 Apr 2025
LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots Dongge Han Trevor A. McInroe Adam Jelley Stefano V. Albrecht Peter Bell Amos Storkey 61 11 0 31 Dec 2024
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts Caroline Brun Vassilina Nikoulina 40 1 0 25 Jun 2024
CMD: a framework for Context-aware Model self-Detoxification Zecheng Tang Keyan Zhou Juntao Li Yuyang Ding Pinzheng Wang Bowen Yan Minzhang MU 25 5 0 16 Aug 2023
Synthetic Pre-Training Tasks for Neural Machine Translation Zexue He Graeme W. Blackwood Yikang Shen Julian McAuley Rogerio Feris 29 3 0 19 Dec 2022
Controlling Bias Exposure for Fair Interpretable Predictions Zexue He Yu Wang Julian McAuley Bodhisattwa Prasad Majumder 27 19 0 14 Oct 2022
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots Waiman Si Michael Backes Jeremy Blackburn Emiliano De Cristofaro Gianluca Stringhini Savvas Zannettou Yang Zhang 36 58 0 07 Sep 2022
The Woman Worked as a Babysitter: On Biases in Language Generation Emily Sheng Kai-Wei Chang Premkumar Natarajan Nanyun Peng 225 621 0 03 Sep 2019