Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.16656
Cited By
Gradient-Based Language Model Red Teaming
30 January 2024
Nevan Wichers
Carson E. Denison
Ahmad Beirami
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient-Based Language Model Red Teaming"
11 / 11 papers shown
Title
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Jonathan Nöther
Adish Singla
Goran Radanović
AAML
97
0
0
14 Jan 2025
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
160
1,376
0
27 Jul 2023
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Deokjae Lee
JunYeong Lee
Jung-Woo Ha
Jin-Hwa Kim
Sang-Woo Lee
Hwaran Lee
Hyun Oh Song
AAML
48
23
0
27 May 2023
Robust Conversational Agents against Imperceptible Toxicity Triggers
Ninareh Mehrabi
Ahmad Beirami
Fred Morstatter
Aram Galstyan
AAML
46
33
0
05 May 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
42
362
0
17 Mar 2022
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
Alyssa Lees
Vinh Q. Tran
Yi Tay
Jeffrey Scott Sorensen
Jai Gupta
Donald Metzler
Lucy Vasserman
50
182
0
22 Feb 2022
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
96
1,577
0
20 Jan 2022
A Plug-and-Play Method for Controlled Text Generation
Damian Pascual
Béni Egressy
Clara Meister
Ryan Cotterell
Roger Wattenhofer
96
91
0
20 Sep 2021
FUDGE: Controlled Text Generation With Future Discriminators
Kevin Kaichuang Yang
Dan Klein
87
324
0
12 Apr 2021
Content preserving text generation with attribute controls
Lajanugen Logeswaran
Honglak Lee
Samy Bengio
59
117
0
03 Nov 2018
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Chris J. Maddison
A. Mnih
Yee Whye Teh
BDL
111
2,523
0
02 Nov 2016
1