ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.02764
  4. Cited By
Adaptive Preference Scaling for Reinforcement Learning with Human
  Feedback

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

4 June 2024
Ilgee Hong
Zichong Li
Alexander Bukharin
Yixiao Li
Haoming Jiang
Tianbao Yang
Tuo Zhao
ArXivPDFHTML

Papers citing "Adaptive Preference Scaling for Reinforcement Learning with Human Feedback"

6 / 6 papers shown
Title
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Xuerui Su
Shufang Xie
Guoqing Liu
Yingce Xia
Renqian Luo
Peiran Jin
Zhiming Ma
Yue Wang
Zun Wang
Yuting Liu
LRM
31
1
0
06 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
32
1
0
03 Apr 2025
Distributionally Robust Reinforcement Learning with Human Feedback
Debmalya Mandal
Paulius Sasnauskas
Goran Radanović
39
1
0
01 Mar 2025
Stochastic Constrained DRO with a Complexity Independent of Sample Size
Stochastic Constrained DRO with a Complexity Independent of Sample Size
Q. Qi
Jiameng Lyu
Kung-Sik Chan
E. Bai
Tianbao Yang
50
15
0
11 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Semantically Distributed Robust Optimization for Vision-and-Language
  Inference
Semantically Distributed Robust Optimization for Vision-and-Language Inference
Tejas Gokhale
A. Chaudhary
Pratyay Banerjee
Chitta Baral
Yezhou Yang
46
17
0
14 Oct 2021
1