ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07708
  4. Cited By
Improving Reinforcement Learning from Human Feedback Using Contrastive
  Rewards
v1v2 (latest)

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

12 March 2024
Wei Shen
Xiaoying Zhang
Yuanshun Yao
Rui Zheng
Hongyi Guo
Yang Liu
    ALM
ArXiv (abs)PDFHTMLGithub

Papers citing "Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards"

14 / 14 papers shown
PC-Diffusion: Aligning Diffusion Models with Human Preferences via Preference Classifier
PC-Diffusion: Aligning Diffusion Models with Human Preferences via Preference Classifier
S. Wang
He Wang
X. Wei
Longquan Dai
Jinhui Tang
262
0
0
11 Nov 2025
Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference
Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference
Matteo Cercola
Valeria Capretti
Simone Formentin
298
3
0
06 Nov 2025
GCPO: When Contrast Fails, Go Gold
GCPO: When Contrast Fails, Go Gold
Hao Wu
Wei Liu
171
1
0
09 Oct 2025
Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback
Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback
Derek Shi
Ruben Glatt
Christine Klymko
Shubham Mohole
Hongjun Choi
Shashank Kushwaha
Sam Sakla
Felipe Leno Da Silva
AI4TSVLM
226
0
0
02 Oct 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
478
14
0
05 May 2025
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Chris
Yichen Wei
Yi Peng
Xiang Wang
Weijie Qiu
...
Jianhao Zhang
Y. Hao
Xuchen Song
Yang Liu
Yahui Zhou
OffRLAI4TSSyDaLRMVLM
644
35
0
23 Apr 2025
Energy-Based Reward Models for Robust Language Model Alignment
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
1.1K
3
0
17 Apr 2025
Reasoning without Regret
Reasoning without Regret
Tarun Chitra
OffRLLRM
328
0
0
14 Apr 2025
Reward Shaping to Mitigate Reward Hacking in RLHF
Reward Shaping to Mitigate Reward Hacking in RLHF
Jiayi Fu
Xuandong Zhao
Chengyuan Yao
Han Wang
Qi Han
Yanghua Xiao
736
58
0
26 Feb 2025
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference OptimizationInternational Conference on Learning Representations (ICLR), 2024
Yougang Lyu
Lingyong Yan
Zihan Wang
D. Yin
Sudipta Singha Roy
Maarten de Rijke
Zhaochun Ren
680
21
0
10 Oct 2024
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
MA-RLHF: Reinforcement Learning from Human Feedback with Macro ActionsInternational Conference on Learning Representations (ICLR), 2024
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua Wu
1.0K
9
0
03 Oct 2024
Reward-Robust RLHF in LLMs
Reward-Robust RLHF in LLMs
Yuzi Yan
Xingzhou Lou
Jialian Li
Yiping Zhang
Jian Xie
Chao Yu
Yu Wang
Dong Yan
Yuan Shen
489
21
0
18 Sep 2024
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Hui Wei
Shenghua He
Tian Xia
Andy H. Wong
Jingyang Lin
Mei Han
Mei Han
ALMELM
596
71
0
23 Aug 2024
Noise Contrastive Alignment of Language Models with Explicit Rewards
Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen
Guande He
Lifan Yuan
Ganqu Cui
Hang Su
Jun Zhu
467
86
0
08 Feb 2024
1
Page 1 of 1