ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.01833
  4. Cited By
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

2 April 2024
M. Russinovich
Ahmed Salem
Ronen Eldan
ArXivPDFHTML

Papers citing "Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack"

50 / 66 papers shown
Title
Access Controls Will Solve the Dual-Use Dilemma
Access Controls Will Solve the Dual-Use Dilemma
Evžen Wybitul
AAML
21
0
0
14 May 2025
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
30
0
0
08 May 2025
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Kai Hu
Weichen Yu
L. Zhang
Alexander Robey
Andy Zou
Chengming Xu
Haoqi Hu
Matt Fredrikson
AAML
VLM
64
0
0
02 May 2025
Safety in Large Reasoning Models: A Survey
Safety in Large Reasoning Models: A Survey
Cheng Wang
Yong-Jin Liu
Yangqiu Song
Duzhen Zhang
ZeLin Li
Junfeng Fang
Bryan Hooi
LRM
153
1
0
24 Apr 2025
The Structural Safety Generalization Problem
The Structural Safety Generalization Problem
Julius Broomfield
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Tia Nasir
Jason Zhang
Reihaneh Iranmanesh
Sara Pieri
Reihaneh Rabbany
Kellin Pelrine
AAML
35
0
0
13 Apr 2025
Bypassing Safety Guardrails in LLMs Using Humor
Bypassing Safety Guardrails in LLMs Using Humor
Pedro Cisneros-Velarde
31
0
0
09 Apr 2025
Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty
Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty
Yu Inatsu
41
0
0
04 Apr 2025
Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning
S. Chen
Xiao Yu
Ninareh Mehrabi
Rahul Gupta
Zhou Yu
Ruoxi Jia
AAML
LLMAG
53
0
0
02 Apr 2025
Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing
Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing
Johan Wahréus
Ahmed Mohamed Hussain
P. Papadimitratos
55
0
0
27 Mar 2025
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
Wenhao You
Bryan Hooi
Yiwei Wang
Yixuan Wang
Zong Ke
Ming Yang
Zi Huang
Yujun Cai
AAML
58
0
0
24 Mar 2025
Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search
Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search
Andy Zhou
MU
69
0
0
13 Mar 2025
Safety Guardrails for LLM-Enabled Robots
Zachary Ravichandran
Alexander Robey
Vijay R. Kumar
George Pappas
Hamed Hassani
58
2
0
10 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
68
0
0
08 Mar 2025
Jailbreaking is (Mostly) Simpler Than You Think
M. Russinovich
Ahmed Salem
AAML
76
0
0
07 Mar 2025
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Defne Tur
Nicholas Meade
Xing Han Lù
Alejandra Zambrano
Arkil Patel
Esin Durmus
Spandana Gella
Karolina Stañczak
Siva Reddy
LLMAG
ELM
87
2
0
06 Mar 2025
Improving LLM Safety Alignment with Dual-Objective Optimization
Xuandong Zhao
Will Cai
Tianneng Shi
David Huang
Licong Lin
Song Mei
Dawn Song
AAML
MU
69
1
0
05 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
176
0
0
03 Mar 2025
Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models
Nimet Beyza Bozdag
Shuhaib Mehri
Gokhan Tur
Dilek Hakkani-Tur
59
0
0
03 Mar 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Hanjiang Hu
Alexander Robey
Changliu Liu
AAML
LLMSV
47
1
0
28 Feb 2025
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng
Xiaolong Jin
Jinyuan Jia
Xiaotian Zhang
AAML
149
0
0
27 Feb 2025
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Shanshan Han
Salman Avestimehr
Chaoyang He
76
0
0
12 Feb 2025
Jailbreaking to Jailbreak
Jailbreaking to Jailbreak
Jeremy Kritz
Vaughn Robinson
Robert Vacareanu
Bijan Varjavand
Michael Choi
Bobby Gogov
Scale Red Team
Summer Yue
Willow Primack
Zifan Wang
209
1
0
09 Feb 2025
Confidence Elicitation: A New Attack Vector for Large Language Models
Confidence Elicitation: A New Attack Vector for Large Language Models
Brian Formento
Chuan-Sheng Foo
See-Kiong Ng
AAML
99
0
0
07 Feb 2025
Lessons From Red Teaming 100 Generative AI Products
Lessons From Red Teaming 100 Generative AI Products
Blake Bullwinkel
Amanda Minnich
Shiven Chawla
Gary Lopez
Martin Pouliot
...
Pete Bryan
Ram Shankar Siva Kumar
Yonatan Zunger
Chang Kawaguchi
Mark Russinovich
AAML
VLM
37
5
0
13 Jan 2025
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
Fengxiang Wang
Ranjie Duan
Peng Xiao
Xiaojun Jia
Shiji Zhao
...
Hang Su
Jialing Tao
Hui Xue
Jun Zhu
Hui Xue
LLMAG
61
7
0
08 Jan 2025
Targeting the Core: A Simple and Effective Method to Attack RAG-based
  Agents via Direct LLM Manipulation
Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation
Xuying Li
Zhuo Li
Yuji Kosuga
Yasuhiro Yoshida
Victor Bian
AAML
88
2
0
05 Dec 2024
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented
  Generation Applications with Agent-based Attacks
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks
Changyue Jiang
Xudong Pan
Geng Hong
Chenfu Bao
Min Yang
SILM
75
9
0
21 Nov 2024
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
67
10
0
18 Nov 2024
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
Adam Fourney
Gagan Bansal
Hussein Mozannar
Cheng Tan
Eduardo Salinas
...
Victor C. Dibia
Ahmed Hassan Awadallah
Ece Kamar
Rafah Hosn
Saleema Amershi
AI4CE
LRM
LLMAG
38
36
0
07 Nov 2024
Plentiful Jailbreaks with String Compositions
Plentiful Jailbreaks with String Compositions
Brian R. Y. Huang
AAML
43
2
0
01 Nov 2024
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In
Itay Nakash
George Kour
Guy Uziel
Ateret Anaby-Tavor
AAML
LLMAG
40
4
0
22 Oct 2024
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language
  Models
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models
Hao Yang
Lizhen Qu
Ehsan Shareghi
Gholamreza Haffari
AAML
36
1
0
15 Oct 2024
Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
Fast Convergence of ΦΦΦ-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
Siddharth Mitra
Andre Wibisono
60
0
0
14 Oct 2024
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent
  Enhanced Explanation Evaluation Framework
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
Dawei Yin
Hao Liu
ELM
32
7
0
11 Oct 2024
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Priyanshu Kumar
Elaine Lau
Saranya Vijayakumar
Tu Trinh
Scale Red Team
...
Sean Hendryx
Shuyan Zhou
Matt Fredrikson
Summer Yue
Zifan Wang
LLMAG
34
17
0
11 Oct 2024
Towards Assurance of LLM Adversarial Robustness using Ontology-Driven
  Argumentation
Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation
Tomas Bueno Momcilovic
Beat Buesser
Giulio Zizzo
Mark Purcell
Tomas Bueno Momcilovic
AAML
32
2
0
10 Oct 2024
Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs
Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs
Tomas Bueno Momcilovic
Beat Buesser
Giulio Zizzo
Mark Purcell
Dian Balta
AAML
40
2
0
04 Oct 2024
Developing Assurance Cases for Adversarial Robustness and Regulatory
  Compliance in LLMs
Developing Assurance Cases for Adversarial Robustness and Regulatory Compliance in LLMs
Tomas Bueno Momcilovic
Dian Balta
Beat Buesser
Giulio Zizzo
Mark Purcell
AAML
26
0
0
04 Oct 2024
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova
Erik Brinkman
Krithika Iyer
Vítor Albiero
Joanna Bitton
Hailey Nguyen
J. Li
Cristian Canton Ferrer
Ivan Evtimov
Aaron Grattafiori
ALM
31
8
0
02 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
34
12
0
02 Oct 2024
Endless Jailbreaks with Bijection Learning
Endless Jailbreaks with Bijection Learning
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
81
5
0
02 Oct 2024
PyRIT: A Framework for Security Risk Identification and Red Teaming in
  Generative AI System
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System
Gary D. Lopez Munoz
Amanda Minnich
Roman Lutz
Richard Lundeen
Raja Sekhar Rao Dheekonda
...
Tori Westerhoff
Chang Kawaguchi
Christian Seifert
Ram Shankar Siva Kumar
Yonatan Zunger
SILM
32
8
0
01 Oct 2024
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data
Xuefeng Du
Reshmi Ghosh
Robert Sim
Ahmed Salem
Vitor Carvalho
Emily Lawton
Yixuan Li
Jack W. Stokes
VLM
AAML
38
6
0
01 Oct 2024
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard
  for Prompt Attacks
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
Giandomenico Cornacchia
Giulio Zizzo
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Mark Purcell
26
1
0
26 Sep 2024
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)
Alan Aqrawi
Arian Abbasi
AAML
31
2
0
04 Sep 2024
Conversational Complexity for Assessing Risk in Large Language Models
Conversational Complexity for Assessing Risk in Large Language Models
John Burden
Manuel Cebrian
José Hernández-Orallo
42
0
0
02 Sep 2024
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak
  Attacks
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Jason Zhang
Julius Broomfield
Sara Pieri
Reihaneh Iranmanesh
Reihaneh Rabbany
Kellin Pelrine
AAML
41
6
0
29 Aug 2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
Ziwen Han
Ian Steneker
Willow Primack
Riley Goodside
Hugh Zhang
Zifan Wang
Cristina Menghini
Summer Yue
AAML
MU
46
40
0
27 Aug 2024
Multi-Turn Context Jailbreak Attack on Large Language Models From First
  Principles
Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
Hui Li
AAML
34
11
0
08 Aug 2024
Does Refusal Training in LLMs Generalize to the Past Tense?
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
50
27
0
16 Jul 2024
12
Next