Automated Red Teaming with GOAT: the Generative Offensive Agent Tester

Automated Red Teaming with GOAT: the Generative Offensive Agent Tester

2 October 2024

Cristian Canton Ferrer

Aaron Grattafiori

ArXiv (abs)PDF HTML

Papers citing "Automated Red Teaming with GOAT: the Generative Offensive Agent Tester"

10 / 10 papers shown

Title
TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts T. Krauß Hamid Dashtbani Alexandra Dmitrienko 25 0 0 09 Jun 2025
Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models Ren-Jian Wang Ke Xue Zeyu Qin Ziniu Li Sheng Tang Hao-Tian Li Shengcai Liu Chao Qian AAML 24 0 0 08 Jun 2025
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents Christian Schroeder de Witt AAML AI4CE 485 6 0 04 May 2025
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks Ivan Evtimov Arman Zharmagambetov Aaron Grattafiori Chuan Guo Kamalika Chaudhuri AAML 118 4 0 22 Apr 2025
Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning Tian Jin Xiao Yu Ninareh Mehrabi Rahul Gupta Zhou Yu Ruoxi Jia AAML LLMAG 110 0 0 02 Apr 2025
Tempest: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search Andy Zhou Ron Arel MU 145 0 0 13 Mar 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks Hanjiang Hu Alexander Robey Changliu Liu AAML LLMSV 107 2 0 28 Feb 2025
TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice Aman Goel Xian Carrie Wu Zhe Wang Dmitriy Bespalov Yanjun Qi 116 0 0 21 Feb 2025
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Maksym Andriushchenko Francesco Croce Nicolas Flammarion AAML 210 222 0 02 Apr 2024
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack M. Russinovich Ahmed Salem Ronen Eldan 122 98 0 02 Apr 2024