Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.02764
Cited By
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
4 June 2024
Ilgee Hong
Zichong Li
Alexander Bukharin
Yixiao Li
Haoming Jiang
Tianbao Yang
Tuo Zhao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Adaptive Preference Scaling for Reinforcement Learning with Human Feedback"
6 / 6 papers shown
Title
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Xuerui Su
Shufang Xie
Guoqing Liu
Yingce Xia
Renqian Luo
Peiran Jin
Zhiming Ma
Yue Wang
Zun Wang
Yuting Liu
LRM
31
1
0
06 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
32
1
0
03 Apr 2025
Distributionally Robust Reinforcement Learning with Human Feedback
Debmalya Mandal
Paulius Sasnauskas
Goran Radanović
39
1
0
01 Mar 2025
Stochastic Constrained DRO with a Complexity Independent of Sample Size
Q. Qi
Jiameng Lyu
Kung-Sik Chan
E. Bai
Tianbao Yang
50
15
0
11 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Semantically Distributed Robust Optimization for Vision-and-Language Inference
Tejas Gokhale
A. Chaudhary
Pratyay Banerjee
Chitta Baral
Yezhou Yang
46
17
0
14 Oct 2021
1