Title |
---|
![]() RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu Wei Xiong Jie Jessie Ren Lichang Chen Junru Wu ...Yuan Liu Bilal Piot Abe Ittycheriah Aviral Kumar Mohammad Saleh |
![]() What makes math problems hard for reinforcement learning: a case study Ali Shehper A. Medina-Mardones Lucas Fagan Angus Gruen Piotr Kucharski Sergei Gukov Piotr Kucharski Zhenghan Wang Sergei Gukov |