Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.08195
Cited By
Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation
11 March 2025
Wenlong Meng
Fan Zhang
Wendao Yao
Zhenyuan Guo
Yongqian Li
Chengkun Wei
Wenzhi Chen
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation"
26 / 26 papers shown
Title
Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models
Mingyu Yu
Wei Wang
Y. X. Wei
Sujuan Qin
AAML
24
0
0
29 May 2025
Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models
Yisheng Zhong
Yizhu Wen
Junfeng Guo
Mehran Kafai
Heng Huang
Hanqing Guo
Zhuangdi Zhu
72
0
0
19 May 2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
SILM
116
11
0
08 Jan 2025
AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs
Lijia Lv
Weigang Zhang
Xuehai Tang
Jie Wen
Feng Liu
Jizhong Han
Songlin Hu
AAML
70
2
0
11 Sep 2024
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
123
35
0
16 Jul 2024
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Ruoyu Qin
Zheming Li
Weiran He
Mingxing Zhang
Yongwei Wu
Weimin Zheng
Xinran Xu
133
66
0
24 Jun 2024
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
Chen Xiong
Xiangyu Qi
Pin-Yu Chen
Tsung-Yi Ho
AAML
106
22
0
30 May 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
144
402
0
06 Apr 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
161
221
0
02 Apr 2024
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong
Noah Lee
James Thorne
OSLM
100
266
0
12 Mar 2024
Learning or Self-aligning? Rethinking Instruction Fine-tuning
Mengjie Ren
Boxi Cao
Hongyu Lin
Liu Cao
Xianpei Han
Ke Zeng
Guanglu Wan
Xunliang Cai
Le Sun
69
29
0
28 Feb 2024
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement
Wenda Xu
Guanglei Zhu
Xuandong Zhao
Liangming Pan
Lei Li
Wenjie Wang
91
63
0
18 Feb 2024
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Hao Zhao
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
ALM
149
56
0
07 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
277
569
0
02 Feb 2024
Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack
Yu Fu
Yufei Li
Wen Xiao
Cong Liu
Yue Dong
AAML
94
5
0
12 Dec 2023
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
Bochuan Cao
Yu Cao
Lu Lin
Jinghui Chen
AAML
76
152
0
18 Sep 2023
YaRN: Efficient Context Window Extension of Large Language Models
Bowen Peng
Jeffrey Quesnelle
Honglu Fan
Enrico Shippole
OSLM
81
264
0
31 Aug 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
295
1,518
0
27 Jul 2023
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
SILM
87
134
0
16 Jul 2023
H
2
_2
2
O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
153
313
0
24 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,169
0
29 May 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,761
0
15 Mar 2023
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
535
6,301
0
05 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
891
13,228
0
04 Mar 2022
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Sewon Min
Xinxi Lyu
Ari Holtzman
Mikel Artetxe
M. Lewis
Hannaneh Hajishirzi
Luke Zettlemoyer
LLMAG
LRM
191
1,501
0
25 Feb 2022
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,332
0
27 Aug 2019
1