ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03463
  4. Cited By
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain
  Chatbots
v1v2 (latest)

Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

7 September 2022
Waiman Si
Michael Backes
Jeremy Blackburn
Emiliano De Cristofaro
Gianluca Stringhini
Savvas Zannettou
Yang Zhang
ArXiv (abs)PDFHTML

Papers citing "Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots"

10 / 10 papers shown
Title
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Mingjie Li
Wai Man Si
Michael Backes
Yang Zhang
Yisen Wang
118
19
0
03 Jan 2025
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Kunsheng Tang
Wenbo Zhou
Jie Zhang
Aishan Liu
Gelei Deng
Shuai Li
Peigui Qi
Weiming Zhang
Tianwei Zhang
Nenghai Yu
135
4
0
22 Aug 2024
A Map of Exploring Human Interaction patterns with LLM: Insights into
  Collaboration and Creativity
A Map of Exploring Human Interaction patterns with LLM: Insights into Collaboration and Creativity
Jiayang Li
Jiale Li
109
8
0
06 Apr 2024
SA-Attack: Improving Adversarial Transferability of Vision-Language
  Pre-training Models via Self-Augmentation
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation
Bangyan He
Xiaojun Jia
Siyuan Liang
Tianrui Lou
Yang Liu
Xiaochun Cao
AAMLVLM
109
29
0
08 Dec 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in
  Perspectives
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
99
9
0
09 Nov 2023
MasterKey: Automated Jailbreak Across Multiple Large Language Model
  Chatbots
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
SILM
99
136
0
16 Jul 2023
Intersectionality in Conversational AI Safety: How Bayesian Multilevel
  Models Help Understand Diverse Perceptions of Safety
Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety
Christopher Homan
Greg Serapio-García
Lora Aroyo
Mark Díaz
Alicia Parrish
Vinodkumar Prabhakaran
Alex S. Taylor
Ding Wang
86
9
0
20 Jun 2023
Safer Conversational AI as a Source of User Delight
Safer Conversational AI as a Source of User Delight
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
Chai Research
70
3
0
18 Apr 2023
Talking Abortion (Mis)information with ChatGPT on TikTok
Talking Abortion (Mis)information with ChatGPT on TikTok
Filipo Sharevski
J. Loop
Peter Jachim
Amy Devine
Emma Pieroni
84
6
0
23 Feb 2023
Beam Search Strategies for Neural Machine Translation
Beam Search Strategies for Neural Machine Translation
Markus Freitag
Yaser Al-Onaizan
129
396
0
06 Feb 2017
1