ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.05163
  4. Cited By
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

7 February 2025
Yihe Deng
Yu Yang
Junkai Zhang
Wei Wang
B. Li
    OffRL
ArXivPDFHTML

Papers citing "DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails"

4 / 4 papers shown
Title
Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models
Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models
Jiawei Kong
Hao Fang
Xiaochen Yang
Kuofeng Gao
Bin Chen
Shu-Tao Xia
Yaowei Wang
Min Zhang
AAML
56
0
0
23 May 2025
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
Yahan Yang
Soham Dan
Shuo Li
Dan Roth
Insup Lee
LRM
71
0
0
21 Apr 2025
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Priyanshu Kumar
Devansh Jain
Akhila Yerukola
Liwei Jiang
Himanshu Beniwal
Thomas Hartvigsen
Maarten Sap
91
1
0
06 Apr 2025
Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
David Noever
Grant Rosario
297
0
0
20 Feb 2025
1