Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.13068
Cited By
Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation
20 May 2024
Yuxi Li
Yi Liu
Yuekang Li
Ling Shi
Gelei Deng
Shengquan Chen
Kailong Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation"
13 / 13 papers shown
Title
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
82
44
0
20 Jan 2025
Gen-AI for User Safety: A Survey
Akshar Prabhu Desai
Tejasvi Ravi
Mohammad Luqman
Mohit Sharma
Nithya Kota
Pranjul Yadav
33
1
0
10 Nov 2024
Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
Zhipeng Wei
Yuqi Liu
N. Benjamin Erichson
AAML
53
1
0
01 Nov 2024
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
Dawei Yin
Hao Liu
ELM
32
7
0
11 Oct 2024
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
51
1
0
09 Oct 2024
Efficient Detection of Toxic Prompts in Large Language Models
Yi Liu
Junzhe Yu
Huijia Sun
Ling Shi
Gelei Deng
Yuqi Chen
Yang Liu
29
4
0
21 Aug 2024
GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
Zhibo Zhang
Wuxia Bai
Yuxi Li
Max Q.-H. Meng
Kaidi Wang
Ling Shi
Li Li
Jun Wang
Haoyu Wang
24
4
0
09 Aug 2024
Misinforming LLMs: vulnerabilities, challenges and opportunities
Jaroslaw Kornowicz
Daniel Geissler
Kirsten Thommes
30
2
0
02 Aug 2024
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao Xu
Fan Liu
Hao Liu
AAML
42
8
0
13 Jun 2024
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
Ali Modarressi
Abdullatif Köksal
Ayyoob Imani
Mohsen Fayyaz
Hinrich Schütze
KELM
109
9
0
17 Apr 2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Patrick Chao
Edoardo Debenedetti
Alexander Robey
Maksym Andriushchenko
Francesco Croce
...
Nicolas Flammarion
George J. Pappas
F. Tramèr
Hamed Hassani
Eric Wong
ALM
ELM
AAML
57
96
0
28 Mar 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
44
73
0
05 Mar 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
117
301
0
19 Sep 2023
1