ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.16822
  4. Cited By
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

26 February 2024
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
Manish Bhatt
Yuning Mao
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
    SyDa
ArXivPDFHTML

Papers citing "Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts"

19 / 19 papers shown
Title
Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites
Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites
Timothée Anne
Noah Syrkis
Meriem Elhosni
Florian Turati
Franck Legendre
Alain Jaquier
Sebastian Risi
26
0
0
10 May 2025
MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming
Stefan Schoepf
Muhammad Zaid Hameed
Ambrish Rawat
Kieran Fraser
Giulio Zizzo
Giandomenico Cornacchia
Mark Purcell
42
0
0
08 Mar 2025
Shh, don't say that! Domain Certification in LLMs
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Guohao Li
Philip H. S. Torr
Adel Bibi
53
1
0
26 Feb 2025
Investigating Non-Transitivity in LLM-as-a-Judge
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu
Laura Ruis
Tim Rocktaschel
Robert Kirk
43
0
0
19 Feb 2025
Fast Proxies for LLM Robustness Evaluation
Fast Proxies for LLM Robustness Evaluation
Tim Beyer
Jan Schuchardt
Leo Schwinn
Stephan Günnemann
AAML
46
0
0
14 Feb 2025
FLAME: Flexible LLM-Assisted Moderation Engine
FLAME: Flexible LLM-Assisted Moderation Engine
Ivan Bakulin
Ilia Kopanichuk
Iaroslav Bespalov
Nikita Radchenko
V. Shaposhnikov
Dmitry V. Dylov
Ivan Oseledets
96
0
0
13 Feb 2025
Jailbreaking to Jailbreak
Jailbreaking to Jailbreak
Jeremy Kritz
Vaughn Robinson
Robert Vacareanu
Bijan Varjavand
Michael Choi
Bobby Gogov
Scale Red Team
Summer Yue
Willow Primack
Zifan Wang
209
1
0
09 Feb 2025
Evolution and The Knightian Blindspot of Machine Learning
Evolution and The Knightian Blindspot of Machine Learning
Joel Lehman
Elliot Meyerson
Tarek El-Gaaly
Kenneth O. Stanley
Tarin Ziyaee
86
1
0
22 Jan 2025
Towards Effective Discrimination Testing for Generative AI
Towards Effective Discrimination Testing for Generative AI
Thomas P. Zollo
Nikita Rajaneesh
Richard Zemel
Talia B. Gillis
Emily Black
30
1
0
31 Dec 2024
Agent Skill Acquisition for Large Language Models via CycleQD
Agent Skill Acquisition for Large Language Models via CycleQD
So Kuroki
Taishi Nakamura
Takuya Akiba
Yujin Tang
MoMe
34
0
0
16 Oct 2024
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen
Dongcheng Zhao
Yiting Dong
Xiang-Yu He
Yi Zeng
AAML
45
0
0
03 Oct 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
40
10
0
27 Jul 2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie
Xiangyu Qi
Yi Zeng
Yangsibo Huang
Udari Madhushani Sehwag
...
Bo Li
Kai Li
Danqi Chen
Peter Henderson
Prateek Mittal
ALM
ELM
58
51
0
20 Jun 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
63
12
0
28 May 2024
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large
  Language Models
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models
Manish P Bhatt
Sahana Chennabasappa
Yue Li
Cyrus Nikolaidis
Daniel Song
...
Yaohui Chen
Dhaval Kapil
David Molnar
Spencer Whitman
Joshua Saxe
ELM
40
34
0
19 Apr 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
117
301
0
19 Sep 2023
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model
  Meta-AI (LLaMA) Using Medical Domain Knowledge
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
Yunxiang Li
Zihan Li
Kai Zhang
Ruilong Dan
Steven Jiang
You Zhang
LM&MA
AI4MH
125
377
0
24 Mar 2023
MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement
  Learning
MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning
Mikayel Samvelyan
Akbir Khan
Michael Dennis
Minqi Jiang
Jack Parker-Holder
Jakob N. Foerster
Roberta Raileanu
Tim Rocktaschel
54
24
0
06 Mar 2023
Replay-Guided Adversarial Environment Design
Replay-Guided Adversarial Environment Design
Minqi Jiang
Michael Dennis
Jack Parker-Holder
Jakob N. Foerster
Edward Grefenstette
Tim Rocktaschel
129
95
0
06 Oct 2021
1