ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.17336
  4. Cited By
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of
  Large Language Models

Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models

26 March 2024
Zhiyuan Yu
Xiaogeng Liu
Shunning Liang
Zach Cameron
Chaowei Xiao
Ning Zhang
ArXivPDFHTML

Papers citing "Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models"

13 / 13 papers shown
Title
Practical Reasoning Interruption Attacks on Reasoning Large Language Models
Practical Reasoning Interruption Attacks on Reasoning Large Language Models
Yu Cui
Cong Zuo
SILM
AAML
LRM
34
0
0
10 May 2025
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
Marco Arazzi
Vignesh Kumar Kembu
Antonino Nocera
V. P.
82
0
0
30 Apr 2025
Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary
Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary
Yakai Li
Jiekang Hu
Weiduan Sang
Luping Ma
Jing Xie
Weijuan Zhang
Aimin Yu
Shijie Zhao
Qingjia Huang
Qihang Zhou
AAML
52
0
0
28 Apr 2025
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models
Siyuan Liang
Jiayang Liu
Jiecheng Zhai
Tianmeng Fang
Rongcheng Tu
A. Liu
Xiaochun Cao
Dacheng Tao
VGen
61
0
0
22 Apr 2025
Controllable Context Sensitivity and the Knob Behind It
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
58
4
0
11 Nov 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
58
13
0
13 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
47
10
0
13 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
79
8
0
08 Jun 2024
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on
  Large Language Models
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
Dong Shu
Mingyu Jin
Suiyuan Zhu
Beichen Wang
Zihao Zhou
Chong Zhang
Yongfeng Zhang
ELM
47
12
0
17 Jan 2024
Poisoning Language Models During Instruction Tuning
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
106
186
0
01 May 2023
Ask Me Anything: A simple strategy for prompting language models
Ask Me Anything: A simple strategy for prompting language models
Simran Arora
A. Narayan
Mayee F. Chen
Laurel J. Orr
Neel Guha
Kush S. Bhatia
Ines Chami
Frederic Sala
Christopher Ré
ReLM
LRM
235
208
0
05 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
372
12,081
0
04 Mar 2022
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,831
0
14 Dec 2020
1