Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.20087
Cited By
Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models
26 May 2025
Makesh Narsimhan Sreedhar
Traian Rebedea
Christopher Parisien
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models"
5 / 5 papers shown
Title
X-Guard: Multilingual Guard Agent for Content Moderation
Bibek Upadhayay
Vahid Behzadan
Ph.D
102
3
0
11 Apr 2025
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
Junda Zhu
Lingyong Yan
Shuaiqiang Wang
Dawei Yin
Lei Sha
AAML
LRM
98
6
0
18 Feb 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
276
26
0
30 Jan 2025
Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Shaona Ghosh
Prasoon Varshney
Makesh Narsimhan Sreedhar
Aishwarya Padmakumar
Traian Rebedea
Jibin Rajan Varghese
Christopher Parisien
137
16
0
15 Jan 2025
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Jingyu Zhang
Ahmed Elgohary
Ahmed Magooda
Daniel Khashabi
Benjamin Van Durme
473
8
0
11 Oct 2024
1