Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes
v1v2 (latest)

Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes

Amrith Setlur
Zijian Wang
Andrew Cohen
Paria Rashidinejad
Sang Michael Xie

Papers citing "Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes"

0 / 0 papers shown

No papers found