ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.11654
  4. Cited By
Ruby Teaming: Improving Quality Diversity Search with Memory for
  Automated Red Teaming

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

17 June 2024
Vernon Toh Yan Han
Rishabh Bhardwaj
Soujanya Poria
ArXivPDFHTML

Papers citing "Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming"

8 / 8 papers shown
Title
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
Quy-Anh Dang
Chris Ngo
Truong-Son Hy
AAML
SyDa
33
0
0
21 Apr 2025
Towards Effective Discrimination Testing for Generative AI
Towards Effective Discrimination Testing for Generative AI
Thomas P. Zollo
Nikita Rajaneesh
Richard Zemel
Talia B. Gillis
Emily Black
35
1
0
31 Dec 2024
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Catherine Yeh
Donghao Ren
Yannick Assogba
Dominik Moritz
Fred Hohman
40
0
0
01 Oct 2024
Ferret: Faster and Effective Automated Red Teaming with Reward-Based
  Scoring Technique
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
Tej Deep Pala
Vernon Y.H. Toh
Rishabh Bhardwaj
Soujanya Poria
AAML
31
2
0
20 Aug 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
83
64
0
26 Feb 2024
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden
  Harms and Biases
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
57
16
0
22 Oct 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
360
3,029
0
22 Mar 2023
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
239
506
0
28 Sep 2022
1