Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.03274
Cited By
v1
v2 (latest)
Recent Advances in Attack and Defense Approaches of Large Language Models
5 September 2024
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Recent Advances in Attack and Defense Approaches of Large Language Models"
8 / 8 papers shown
Title
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
133
59
0
20 Jan 2025
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
133
63
0
01 Aug 2024
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
142
36
0
16 Jul 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
206
222
0
02 Apr 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
172
259
0
22 Jan 2024
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar
Chirag Agarwal
Suraj Srinivas
Aaron Jiaxun Li
Soheil Feizi
Himabindu Lakkaraju
AAML
155
197
0
06 Sep 2023
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning
Yun Luo
Zhen Yang
Fandong Meng
Yafu Li
Jie Zhou
Yue Zhang
CLL
KELM
211
319
0
17 Aug 2023
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
Vitaly Feldman
Chiyuan Zhang
TDI
248
472
0
09 Aug 2020
1