Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2510.20956
Cited By
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training
23 October 2025
Zheng-Xin Yong
Stephen H. Bach
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training"
0 / 0 papers shown
Title
No papers found