ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.13457
  4. Cited By
A Comprehensive Study of Jailbreak Attack versus Defense for Large
  Language Models

A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models

21 February 2024
Zihao Xu
Yi Liu
Gelei Deng
Yuekang Li
S. Picek
    PILM
    AAML
ArXivPDFHTML

Papers citing "A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models"

13 / 13 papers shown
Title
Adversarial Suffix Filtering: a Defense Pipeline for LLMs
Adversarial Suffix Filtering: a Defense Pipeline for LLMs
David Khachaturov
Robert D. Mullins
AAML
26
0
0
14 May 2025
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
Christian Schroeder de Witt
AAML
AI4CE
156
1
0
04 May 2025
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Haoming Yang
Ke Ma
Xiaojun Jia
Yingfei Sun
Qianqian Xu
Qingming Huang
AAML
162
0
0
03 May 2025
Good News for Script Kiddies? Evaluating Large Language Models for Automated Exploit Generation
Good News for Script Kiddies? Evaluating Large Language Models for Automated Exploit Generation
David Jin
Qian Fu
Yuekang Li
29
0
0
02 May 2025
StyleRec: A Benchmark Dataset for Prompt Recovery in Writing Style Transformation
StyleRec: A Benchmark Dataset for Prompt Recovery in Writing Style Transformation
Shenyang Liu
Yang Gao
Shaoyan Zhai
Liqiang Wang
32
0
0
06 Apr 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAML
AuLLM
65
1
0
02 Feb 2025
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
Jiawei Zhao
Kejiang Chen
Weinan Zhang
Nenghai Yu
AAML
40
0
0
03 Nov 2024
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin
Sadhika Malladi
Adithya Bhaskar
Danqi Chen
Sanjeev Arora
Boris Hanin
93
16
0
11 Oct 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
57
1
0
05 Sep 2024
Don't Say No: Jailbreaking LLM by Suppressing Refusal
Don't Say No: Jailbreaking LLM by Suppressing Refusal
Yukai Zhou
Wenjie Wang
AAML
42
15
0
25 Apr 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
117
301
0
19 Sep 2023
Opportunities and Challenges for ChatGPT and Large Language Models in
  Biomedicine and Health
Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health
Shubo Tian
Qiao Jin
Lana Yeganova
Po-Ting Lai
Qingqing Zhu
...
Donald C. Comeau
R. Islamaj
Aadit Kapoor
Xin Gao
Zhiyong Lu
LM&MA
MedIm
AI4MH
109
210
0
15 Jun 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
333
11,953
0
04 Mar 2022
1