ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.13068
  4. Cited By
Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation

20 May 2024
Yuxi Li
Yi Liu
Yuekang Li
Ling Shi
Gelei Deng
Shengquan Chen
Kailong Wang
ArXivPDFHTML

Papers citing "Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation"

13 / 13 papers shown
Title
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
82
44
0
20 Jan 2025
Gen-AI for User Safety: A Survey
Gen-AI for User Safety: A Survey
Akshar Prabhu Desai
Tejasvi Ravi
Mohammad Luqman
Mohit Sharma
Nithya Kota
Pranjul Yadav
33
1
0
10 Nov 2024
Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
Zhipeng Wei
Yuqi Liu
N. Benjamin Erichson
AAML
53
1
0
01 Nov 2024
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent
  Enhanced Explanation Evaluation Framework
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
Dawei Yin
Hao Liu
ELM
32
7
0
11 Oct 2024
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and
  Ethical Considerations
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
51
1
0
09 Oct 2024
Efficient Detection of Toxic Prompts in Large Language Models
Efficient Detection of Toxic Prompts in Large Language Models
Yi Liu
Junzhe Yu
Huijia Sun
Ling Shi
Gelei Deng
Yuqi Chen
Yang Liu
29
4
0
21 Aug 2024
GlitchProber: Advancing Effective Detection and Mitigation of Glitch
  Tokens in Large Language Models
GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
Zhibo Zhang
Wuxia Bai
Yuxi Li
Max Q.-H. Meng
Kaidi Wang
Ling Shi
Li Li
Jun Wang
Haoyu Wang
24
4
0
09 Aug 2024
Misinforming LLMs: vulnerabilities, challenges and opportunities
Misinforming LLMs: vulnerabilities, challenges and opportunities
Jaroslaw Kornowicz
Daniel Geissler
Kirsten Thommes
30
2
0
02 Aug 2024
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao Xu
Fan Liu
Hao Liu
AAML
42
8
0
13 Jun 2024
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
Ali Modarressi
Abdullatif Köksal
Ayyoob Imani
Mohsen Fayyaz
Hinrich Schütze
KELM
109
9
0
17 Apr 2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large
  Language Models
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Patrick Chao
Edoardo Debenedetti
Alexander Robey
Maksym Andriushchenko
Francesco Croce
...
Nicolas Flammarion
George J. Pappas
F. Tramèr
Hamed Hassani
Eric Wong
ALM
ELM
AAML
57
96
0
28 Mar 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated
  Large Language Model Agents
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
44
73
0
05 Mar 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
117
301
0
19 Sep 2023
1