v1v2 (latest)
Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment
Papers citing "Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment"
37 / 37 papers shown
Title |
---|
![]() Unpacking DPO and PPO: Disentangling Best Practices for Learning from
Preference Feedback Hamish Ivison Yizhong Wang Jiacheng Liu Zeqiu Wu Valentina Pyatkin Nathan Lambert Noah A. Smith Yejin Choi Hannaneh Hajishirzi |