Ruby Teaming: Improving Quality Diversity Search with Memory for
Automated Red Teaming

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

17 June 2024

Vernon Toh Yan Han

Rishabh Bhardwaj

Soujanya Poria

Papers citing "Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming"

8 / 8 papers shown

Title
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search Quy-Anh Dang Chris Ngo Truong-Son Hy AAML SyDa 33 0 0 21 Apr 2025
Towards Effective Discrimination Testing for Generative AI Thomas P. Zollo Nikita Rajaneesh Richard Zemel Talia B. Gillis Emily Black 35 1 0 31 Dec 2024
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation Catherine Yeh Donghao Ren Yannick Assogba Dominik Moritz Fred Hohman 40 0 0 01 Oct 2024
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique Tej Deep Pala Vernon Y.H. Toh Rishabh Bhardwaj Soujanya Poria AAML 31 2 0 20 Aug 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Mikayel Samvelyan Sharath Chandra Raparthy Andrei Lupu Eric Hambro Aram H. Markosyan ... Minqi Jiang Jack Parker-Holder Jakob Foerster Tim Rocktaschel Roberta Raileanu SyDa 83 64 0 26 Feb 2024
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases Rishabh Bhardwaj Soujanya Poria ALM 57 16 0 22 Oct 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 360 3,029 0 22 Mar 2023
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 239 506 0 28 Sep 2022