Uncertainty Estimation for Language Reward Models

Uncertainty Estimation for Language Reward Models

14 March 2022

Adam Gleave

Papers citing "Uncertainty Estimation for Language Reward Models"

9 / 9 papers shown

Title
On the Robustness of Reward Models for Language Model Alignment Jiwoo Hong Noah Lee Eunki Kim Guijin Son Woojin Chung Aman Gupta Shao Tang James Thorne 29 0 0 12 May 2025
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models Keming Lu Hongyi Yuan Runji Lin Junyang Lin Zheng Yuan Chang Zhou Jingren Zhou MoE LRM 42 52 0 15 Nov 2023
Scaling Laws for Reward Model Overoptimization Leo Gao John Schulman Jacob Hilton ALM 41 481 0 19 Oct 2022
To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models Julius Gonsior C. Falkenberg Silvio Magino Anja Reusch Maik Thiele Wolfgang Lehner UQCV 36 7 0 06 Oct 2022
Calibration of Pre-trained Transformers Shrey Desai Greg Durrett UQLM 243 290 0 17 Mar 2020
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 298 1,610 0 18 Sep 2019
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles Balaji Lakshminarayanan Alexander Pritzel Charles Blundell UQCV BDL 276 5,675 0 05 Dec 2016
Teaching Machines to Read and Comprehend Karl Moritz Hermann Tomás Kociský Edward Grefenstette L. Espeholt W. Kay Mustafa Suleyman Phil Blunsom 184 3,513 0 10 Jun 2015
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 285 9,145 0 06 Jun 2015