Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.10616
Cited By
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
13 December 2024
Avinandan Bose
Zhihan Xiong
Aadirupa Saha
S. Du
Maryam Fazel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration"
1 / 1 papers shown
Title
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
55
0
0
05 May 2025
1