Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.05163
Cited By
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
7 February 2025
Yihe Deng
Yu Yang
Junkai Zhang
Wei Wang
B. Li
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails"
4 / 4 papers shown
Title
Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models
Jiawei Kong
Hao Fang
Xiaochen Yang
Kuofeng Gao
Bin Chen
Shu-Tao Xia
Yaowei Wang
Min Zhang
AAML
56
0
0
23 May 2025
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
Yahan Yang
Soham Dan
Shuo Li
Dan Roth
Insup Lee
LRM
71
0
0
21 Apr 2025
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Priyanshu Kumar
Devansh Jain
Akhila Yerukola
Liwei Jiang
Himanshu Beniwal
Thomas Hartvigsen
Maarten Sap
91
1
0
06 Apr 2025
Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
David Noever
Grant Rosario
297
0
0
20 Feb 2025
1