Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.02197
Cited By
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
3 October 2024
Yifan Zhang
Ge Zhang
Yue Wu
Kangping Xu
Quanquan Gu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment"
2 / 2 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
25
1
0
03 Apr 2025
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
J. Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
134
7
0
05 Dec 2024
1