Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.17650
Cited By
Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?
23 May 2025
Chengda Lu
Xiaoyu Fan
Yu Huang
Rongwu Xu
Jijie Li
Wei Xu
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?"
11 / 11 papers shown
Title
Quantile Regression for Distributional Reward Models in RLHF
Nicolai Dorka
76
24
0
16 Sep 2024
Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
Hui Li
AAML
59
18
0
08 Aug 2024
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
M. Russinovich
Ahmed Salem
Ronen Eldan
101
94
0
02 Apr 2024
Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue
Zhenhong Zhou
Jiuyang Xiang
Haopeng Chen
Quan Liu
Zherui Li
Sen Su
81
25
0
27 Feb 2024
DeepInception: Hypnotize Large Language Model to Be Jailbreaker
Xuan Li
Zhanke Zhou
Jianing Zhu
Jiangchao Yao
Tongliang Liu
Bo Han
83
184
0
06 Nov 2023
Jailbreaking Black Box Large Language Models in Twenty Queries
Patrick Chao
Alexander Robey
Yan Sun
Hamed Hassani
George J. Pappas
Eric Wong
AAML
108
690
0
12 Oct 2023
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Xiangyu Qi
Yi Zeng
Tinghao Xie
Pin-Yu Chen
Ruoxi Jia
Prateek Mittal
Peter Henderson
SILM
118
605
0
05 Oct 2023
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
Xiaogeng Liu
Nan Xu
Muhao Chen
Chaowei Xiao
SILM
77
314
0
03 Oct 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
291
1,455
0
27 Jul 2023
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
513
4,428
0
24 May 2022
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
108
1,036
0
08 Dec 2021
1