Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.03463
Cited By
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
7 September 2022
Waiman Si
Michael Backes
Jeremy Blackburn
Emiliano De Cristofaro
Gianluca Stringhini
Savvas Zannettou
Yang Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots"
37 / 37 papers shown
Title
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
Muhammad Ibrahim
Gűliz Seray Tuncay
Z. Berkay Celik
Aravind Machiry
Antonio Bianchi
31
0
0
13 May 2025
The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships
Renwen Zhang
Han Li
Han Meng
Jinyuan Zhan
Hongyuan Gan
Yi-Chieh Lee
24
6
0
26 Oct 2024
Vision Language Models Can Parse Floor Plan Maps
David DeFazio
Hrudayangam Mehta
Jeremy Blackburn
Shiqi Zhang
CoGe
23
0
0
19 Sep 2024
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
Bocheng Chen
Hanqing Guo
Guangjing Wang
Yuanda Wang
Qiben Yan
AAML
37
4
0
01 Sep 2024
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Kunsheng Tang
Wenbo Zhou
Jie Zhang
Aishan Liu
Gelei Deng
Shuai Li
Peigui Qi
Weiming Zhang
Tianwei Zhang
Nenghai Yu
39
3
0
22 Aug 2024
Efficient Detection of Toxic Prompts in Large Language Models
Yi Liu
Junzhe Yu
Huijia Sun
Ling Shi
Gelei Deng
Yuqi Chen
Yang Liu
29
4
0
21 Aug 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
34
6
0
26 Jun 2024
A Map of Exploring Human Interaction patterns with LLM: Insights into Collaboration and Creativity
Jiayang Li
Jiale Li
37
7
0
06 Apr 2024
SOTOPIA-
π
π
π
: Interactive Learning of Socially Intelligent Language Agents
Ruiyi Wang
Haofei Yu
W. Zhang
Zhengyang Qi
Maarten Sap
Graham Neubig
Yonatan Bisk
Hao Zhu
LLMAG
43
37
0
13 Mar 2024
Prompt Stealing Attacks Against Large Language Models
Zeyang Sha
Yang Zhang
SILM
AAML
43
28
0
20 Feb 2024
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
Gelei Deng
Yi Liu
Kailong Wang
Yuekang Li
Tianwei Zhang
Yang Liu
18
41
0
13 Feb 2024
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation
Bangyan He
Xiaojun Jia
Siyuan Liang
Tianrui Lou
Yang Liu
Xiaochun Cao
AAML
VLM
31
23
0
08 Dec 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
37
9
0
09 Nov 2023
Comprehensive Assessment of Toxicity in ChatGPT
Boyang Zhang
Xinyue Shen
Waiman Si
Zeyang Sha
Z. Chen
Ahmed Salem
Yun Shen
Michael Backes
Yang Zhang
SILM
21
3
0
03 Nov 2023
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu
Keyi Kong
Ning Liu
Li-zhen Cui
Di Wang
Jingfeng Zhang
Mohan S. Kankanhalli
AAML
SILM
25
68
0
20 Oct 2023
Prompt Packer: Deceiving LLMs through Compositional Instruction with Hidden Attacks
Shuyu Jiang
Xingshu Chen
Rui Tang
24
22
0
16 Oct 2023
Low-Resource Languages Jailbreak GPT-4
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
SILM
31
170
0
03 Oct 2023
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions
Yufan Chen
Arjun Arunasalam
Z. Berkay Celik
30
34
0
03 Oct 2023
Bias and Fairness in Chatbots: An Overview
Jintang Xue
Yun Cheng Wang
Chengwei Wei
Xiaofeng Liu
Jonghye Woo
C.-C. Jay Kuo
36
29
0
16 Sep 2023
AI in the Gray: Exploring Moderation Policies in Dialogic Large Language Models vs. Human Answers in Controversial Topics
V. Ghafouri
Vibhor Agarwal
Yong Zhang
Nishanth R. Sastry
Jose Such
Guillermo Suarez-Tangil
AI4MH
21
21
0
28 Aug 2023
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen
Z. Chen
Michael Backes
Yun Shen
Yang Zhang
SILM
40
245
0
07 Aug 2023
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
SILM
37
118
0
16 Jul 2023
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
Bocheng Chen
Guangjing Wang
Hanqing Guo
Yuanda Wang
Qiben Yan
30
15
0
14 Jul 2023
Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety
Christopher Homan
Greg Serapio-García
Lora Aroyo
Mark Díaz
Alicia Parrish
Vinodkumar Prabhakaran
Alex S. Taylor
Ding Wang
22
9
0
20 Jun 2023
DICES Dataset: Diversity in Conversational AI Evaluation for Safety
Lora Aroyo
Alex S. Taylor
Mark Díaz
Christopher Homan
Alicia Parrish
Greg Serapio-García
Vinodkumar Prabhakaran
Ding Wang
26
33
0
20 Jun 2023
Prompt Injection attack against LLM-integrated Applications
Yi Liu
Gelei Deng
Yuekang Li
Kailong Wang
Zihao Wang
...
Tianwei Zhang
Yepang Liu
Haoyu Wang
Yanhong Zheng
Yang Liu
SILM
41
317
0
08 Jun 2023
BiasAsker: Measuring the Bias in Conversational AI System
Yuxuan Wan
Wenxuan Wang
Pinjia He
Jiazhen Gu
Haonan Bai
Michael Lyu
29
67
0
21 May 2023
Generating Phishing Attacks using ChatGPT
Sayak Saha Roy
Krishna Vamsi Naragam
Shirin Nilizadeh
32
34
0
09 May 2023
Safer Conversational AI as a Source of User Delight
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
Chai Research
23
3
0
18 Apr 2023
Talking Abortion (Mis)information with ChatGPT on TikTok
Filipo Sharevski
J. Loop
Peter Jachim
Amy Devine
Emma Pieroni
31
5
0
23 Feb 2023
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao Sun
Zhexin Zhang
Minlie Huang
LM&MA
ELM
28
16
0
18 Feb 2023
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Terry Yue Zhuo
Yujin Huang
Chunyang Chen
Zhenchang Xing
SILM
28
102
0
30 Jan 2023
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
250
193
0
15 Sep 2021
"A Virus Has No Religion": Analyzing Islamophobia on Twitter During the COVID-19 Outbreak
Mohit Chandra
Manvith Reddy
Shradha Sehgal
Saurabh Gupta
Arun Balaji Buduru
Ponnurangam Kumaraguru
21
41
0
11 Jul 2021
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
214
616
0
03 Sep 2019
A Survey on Bias and Fairness in Machine Learning
Ninareh Mehrabi
Fred Morstatter
N. Saxena
Kristina Lerman
Aram Galstyan
SyDa
FaML
323
4,212
0
23 Aug 2019
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
214
1,326
0
05 Jun 2016
1