v1v2 (latest)
Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes
Amrith Setlur
Zijian Wang
Andrew Cohen
Paria Rashidinejad
Sang Michael Xie
Papers citing "Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes"
0 / 0 papers shown
No papers found |
