ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.16656
  4. Cited By
Gradient-Based Language Model Red Teaming

Gradient-Based Language Model Red Teaming

30 January 2024
Nevan Wichers
Carson E. Denison
Ahmad Beirami
ArXivPDFHTML

Papers citing "Gradient-Based Language Model Red Teaming"

11 / 11 papers shown
Title
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Jonathan Nöther
Adish Singla
Goran Radanović
AAML
97
0
0
14 Jan 2025
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
160
1,376
0
27 Jul 2023
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Deokjae Lee
JunYeong Lee
Jung-Woo Ha
Jin-Hwa Kim
Sang-Woo Lee
Hwaran Lee
Hyun Oh Song
AAML
48
23
0
27 May 2023
Robust Conversational Agents against Imperceptible Toxicity Triggers
Robust Conversational Agents against Imperceptible Toxicity Triggers
Ninareh Mehrabi
Ahmad Beirami
Fred Morstatter
Aram Galstyan
AAML
46
33
0
05 May 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
  Implicit Hate Speech Detection
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
42
362
0
17 Mar 2022
A New Generation of Perspective API: Efficient Multilingual
  Character-level Transformers
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
Alyssa Lees
Vinh Q. Tran
Yi Tay
Jeffrey Scott Sorensen
Jai Gupta
Donald Metzler
Lucy Vasserman
50
182
0
22 Feb 2022
LaMDA: Language Models for Dialog Applications
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
96
1,577
0
20 Jan 2022
A Plug-and-Play Method for Controlled Text Generation
A Plug-and-Play Method for Controlled Text Generation
Damian Pascual
Béni Egressy
Clara Meister
Ryan Cotterell
Roger Wattenhofer
96
91
0
20 Sep 2021
FUDGE: Controlled Text Generation With Future Discriminators
FUDGE: Controlled Text Generation With Future Discriminators
Kevin Kaichuang Yang
Dan Klein
87
324
0
12 Apr 2021
Content preserving text generation with attribute controls
Content preserving text generation with attribute controls
Lajanugen Logeswaran
Honglak Lee
Samy Bengio
59
117
0
03 Nov 2018
The Concrete Distribution: A Continuous Relaxation of Discrete Random
  Variables
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Chris J. Maddison
A. Mnih
Yee Whye Teh
BDL
111
2,523
0
02 Nov 2016
1