Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.11654
Cited By
Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming
17 June 2024
Vernon Toh Yan Han
Rishabh Bhardwaj
Soujanya Poria
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming"
8 / 8 papers shown
Title
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
Quy-Anh Dang
Chris Ngo
Truong-Son Hy
AAML
SyDa
33
0
0
21 Apr 2025
Towards Effective Discrimination Testing for Generative AI
Thomas P. Zollo
Nikita Rajaneesh
Richard Zemel
Talia B. Gillis
Emily Black
35
1
0
31 Dec 2024
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Catherine Yeh
Donghao Ren
Yannick Assogba
Dominik Moritz
Fred Hohman
40
0
0
01 Oct 2024
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
Tej Deep Pala
Vernon Y.H. Toh
Rishabh Bhardwaj
Soujanya Poria
AAML
31
2
0
20 Aug 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
83
64
0
26 Feb 2024
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
57
16
0
22 Oct 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
360
3,029
0
22 Mar 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
239
506
0
28 Sep 2022
1