
Title |
|---|
![]() Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language ModelsInternational Conference on Learning Representations (ICLR), 2024 |
![]() Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHFInternational Conference on Learning Representations (ICLR), 2024 |