Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.18370
Cited By
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
25 July 2024
Jaehun Jung
Faeze Brahman
Yejin Choi
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement"
14 / 14 papers shown
Title
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
102
10
0
17 Oct 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
139
65
0
18 Jun 2024
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Pat Verga
Sebastian Hofstatter
Sophia Althammer
Yixuan Su
Aleksandra Piktus
Arkady Arkhangorodsky
Minjie Xu
Naomi White
Patrick Lewis
ALM
ELM
102
104
0
29 Apr 2024
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
126
55
0
15 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
122
400
0
06 Apr 2024
R-Tuning: Instructing Large Language Models to Say `I Don't Know'
Hanning Zhang
Shizhe Diao
Yong Lin
Yi R. Fung
Qing Lian
Xingyao Wang
Yangyi Chen
Heng Ji
Tong Zhang
UQLM
98
46
0
16 Nov 2023
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELM
HILM
SyDa
75
184
0
14 Nov 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
113
142
0
26 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
408
4,422
0
09 Jun 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
132
605
0
22 May 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,699
0
15 Mar 2023
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control
Anastasios Nikolas Angelopoulos
Stephen Bates
Emmanuel J. Candès
Michael I. Jordan
Lihua Lei
271
134
0
03 Oct 2021
Distribution-Free, Risk-Controlling Prediction Sets
Stephen Bates
Anastasios Nikolas Angelopoulos
Lihua Lei
Jitendra Malik
Michael I. Jordan
OOD
263
199
0
07 Jan 2021
Selective Classification for Deep Neural Networks
Yonatan Geifman
Ran El-Yaniv
CVBM
95
529
0
23 May 2017
1