ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.08067
  4. Cited By
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

10 October 2024
Shenao Zhang
Zhihan Liu
Boyi Liu
Yuhang Zhang
Yingxiang Yang
Y. Liu
Liyu Chen
Tao Sun
Ziyi Wang
ArXivPDFHTML

Papers citing "Reward-Augmented Data Enhances Direct Preference Alignment of LLMs"

2 / 2 papers shown
Title
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code
  to Improve Code LMs
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Zhihan Liu
Shenao Zhang
Yongfei Liu
Boyi Liu
Yingxiang Yang
Zhaoran Wang
113
2
0
20 Nov 2024
Online Bandit Learning with Offline Preference Data for Improved RLHF
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
37
2
0
13 Jun 2024
1