Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.13210
Cited By
Bayesian Reward Models for LLM Alignment
20 February 2024
Adam X. Yang
Maxime Robeyns
Thomas Coste
Zhengyan Shi
Jun Wang
Haitham Bou-Ammar
Laurence Aitchison
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Bayesian Reward Models for LLM Alignment"
8 / 8 papers shown
Title
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
137
0
0
17 Apr 2025
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
44
43
0
14 Jun 2024
Asymptotics of Language Model Alignment
Joy Qiping Yang
Salman Salamatian
Ziteng Sun
A. Suresh
Ahmad Beirami
63
21
0
02 Apr 2024
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen
Chen Zhu
Davit Soselia
Jiuhai Chen
Dinesh Manocha
Tom Goldstein
Heng-Chiao Huang
M. Shoeybi
Bryan Catanzaro
AAML
47
51
0
11 Feb 2024
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
114
93
0
22 Jan 2024
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Yuanzhao Zhai
Han Zhang
Yu Lei
Yue Yu
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
AI4CE
72
32
0
30 Dec 2023
Accelerated Linearized Laplace Approximation for Bayesian Deep Learning
Zhijie Deng
Feng Zhou
Jun Zhu
BDL
47
19
0
23 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
1