Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.06491
Cited By
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
9 October 2024
Leo McKee-Reid
Christoph Sträter
Maria Angelica Martinez
Joe Needham
Mikita Balesni
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack"
Title
No papers