ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.06491
  4. Cited By
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest
  Models Reward Hack

Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack

9 October 2024
Leo McKee-Reid
Christoph Sträter
Maria Angelica Martinez
Joe Needham
Mikita Balesni
    OffRL
ArXivPDFHTML

Papers citing "Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack"

Title
No papers