Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2510.21285
Cited By
v1
v2 (latest)
When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails
24 October 2025
Yingzhi Mao
Chunkang Zhang
Junxiang Wang
Xinyan Guan
Boxi Cao
Yaojie Lu
Hongyu Lin
Xianpei Han
Le Sun
LRM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Github (2★)
Papers citing
"When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails"
0 / 0 papers shown
Title
No papers found