ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.18693
  4. Cited By
Diverse and Effective Red Teaming with Auto-generated Rewards and
  Multi-step Reinforcement Learning

Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning

24 December 2024
Alex Beutel
Kai Y. Xiao
Johannes Heidecke
Lilian Weng
    AAML
ArXivPDFHTML

Papers citing "Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning"

2 / 2 papers shown
Title
When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines
When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines
Sachin R. Pendse
Darren Gergle
Rachel Kornfield
J. Meyerhoff
David C. Mohr
Jina Suh
Annie Wescott
Casey Williams
J. Schleider
39
0
0
29 Apr 2025
Jailbreaking to Jailbreak
Jailbreaking to Jailbreak
Jeremy Kritz
Vaughn Robinson
Robert Vacareanu
Bijan Varjavand
Michael Choi
Bobby Gogov
Scale Red Team
Summer Yue
Willow Primack
Zifan Wang
258
2
0
09 Feb 2025
1