Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

11 April 2023

Ameet Deshpande

Vishvak Murahari

Tanmay Rajpurohit

Karthik Narasimhan

Papers citing "Toxicity in ChatGPT: Analyzing Persona-assigned Language Models"

16 / 66 papers shown

Title
Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting Tilman Beck Hendrik Schuff Anne Lauscher Iryna Gurevych 43 34 0 13 Sep 2023
CMD: a framework for Context-aware Model self-Detoxification Zecheng Tang Keyan Zhou Juntao Li Yuyang Ding Pinzheng Wang Bowen Yan Minzhang MU 25 5 0 16 Aug 2023
Collective Human Opinions in Semantic Textual Similarity Yuxia Wang Shimin Tao Ning Xie Hao Yang Timothy Baldwin Karin Verspoor 29 4 0 08 Aug 2023
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy Yu Fu Deyi Xiong Yue Dong WaLM 55 32 0 25 Jul 2023
Large Language Models as Superpositions of Cultural Perspectives Grgur Kovač Masataka Sawayama Rémy Portelas Cédric Colas Peter Ford Dominey Pierre-Yves Oudeyer LLMAG 38 33 0 15 Jul 2023
On the Exploitability of Instruction Tuning Manli Shu Jiong Wang Chen Zhu Jonas Geiping Chaowei Xiao Tom Goldstein SILM 47 92 0 28 Jun 2023
In-Context Impersonation Reveals Large Language Models' Strengths and Biases Leonard Salewski Stephan Alaniz Isabel Rio-Torto Eric Schulz Zeynep Akata 44 151 0 24 May 2023
Aligning Language Models to User Opinions EunJeong Hwang Bodhisattwa Prasad Majumder Niket Tandon 29 62 0 24 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation Xiaowei Huang Wenjie Ruan Wei Huang Gao Jin Yizhen Dong ... Sihao Wu Peipei Xu Dengyu Wu André Freitas Mustafa A. Mustafa ALM 49 83 0 19 May 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection Kai Greshake Sahar Abdelnabi Shailesh Mishra C. Endres Thorsten Holz Mario Fritz SILM 49 443 0 23 Feb 2023
Revision Transformers: Instructing Language Models to Change their Values Felix Friedrich Wolfgang Stammer P. Schramowski Kristian Kersting KELM 33 6 0 19 Oct 2022
Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy Allison Lahnala Charles F Welch Béla Neuendorf Lucie Flek 65 13 0 15 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 375 12,081 0 04 Mar 2022
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Timo Schick Sahana Udupa Hinrich Schütze 265 373 0 28 Feb 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 264 4,505 0 23 Jan 2020
The Woman Worked as a Babysitter: On Biases in Language Generation Emily Sheng Kai-Wei Chang Premkumar Natarajan Nanyun Peng 225 620 0 03 Sep 2019