ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.10616
  4. Cited By
Hybrid Preference Optimization for Alignment: Provably Faster
  Convergence Rates by Combining Offline Preferences with Online Exploration

Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration

13 December 2024
Avinandan Bose
Zhihan Xiong
Aadirupa Saha
S. Du
Maryam Fazel
ArXivPDFHTML

Papers citing "Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration"

1 / 1 papers shown
Title
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
55
0
0
05 May 2025
1