Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.08491
Cited By
v1
v2 (latest)
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
12 October 2023
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
Hwaran Lee
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Prometheus: Inducing Fine-grained Evaluation Capability in Language Models"
24 / 74 papers shown
Title
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
89
13
0
09 Oct 2024
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
120
7
0
07 Oct 2024
Better Instruction-Following Through Minimum Bayes Risk
Ian Wu
Patrick Fernandes
Amanda Bertsch
Seungone Kim
Sina Pakazad
Graham Neubig
135
11
0
03 Oct 2024
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
148
19
0
30 Sep 2024
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang
Dading Chong
Feng Jiang
Chengguang Tang
Anningzhe Gao
Guohua Tang
Haizhou Li
ALM
105
2
0
20 Sep 2024
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su
Xuhui Zhou
Sanketh Rangreji
Anubha Kabra
Julia Mendelsohn
Faeze Brahman
Maarten Sap
LLMAG
183
7
0
13 Sep 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment
Leitian Tao
Yixuan Li
145
9
0
13 Sep 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
Sacha Muller
António Loison
Bilel Omrani
Gautier Viaud
RALM
ELM
104
2
0
10 Sep 2024
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
Andreas Stephan
D. Zhu
Matthias Aßenmacher
Xiaoyu Shen
Benjamin Roth
ELM
125
6
0
06 Sep 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
185
12
0
27 Aug 2024
DataGen: Unified Synthetic Dataset Generation via Large Language Models
Yue Huang
Siyuan Wu
Chujie Gao
Dongping Chen
Qihui Zhang
...
Tianyi Zhou
Xiangliang Zhang
Jianfeng Gao
Chaowei Xiao
Lichao Sun
SyDa
116
20
0
27 Jun 2024
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
Yuhang Wu
Wenmeng Yu
Yean Cheng
Yan Wang
Xiaohan Zhang
Jiazheng Xu
Ming Ding
Yuxiao Dong
93
2
0
13 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
204
44
0
09 Jun 2024
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Zicheng Zhang
H. Wu
Chunyi Li
Yingjie Zhou
Wei Sun
Xiongkuo Min
Zijian Chen
Xiaohong Liu
Weisi Lin
Guangtao Zhai
EGVM
145
18
0
05 Jun 2024
Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging
Xiaobo Liang
Haoke Zhang
Helan hu
Juntao Li
Jun Xu
Min Zhang
ALM
72
3
0
20 May 2024
ContextQ: Generated Questions to Support Meaningful Parent-Child Dialogue While Co-Reading
Griffin Dietz Smith
Siddhartha Prasad
Matt J. Davidson
Leah Findlater
R. Benjamin Shapiro
83
8
0
06 May 2024
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Yijia Shao
Yucheng Jiang
Theodore A. Kanell
Peter Xu
Omar Khattab
Monica S. Lam
LLMAG
KELM
116
51
0
22 Feb 2024
DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection
Herun Wan
Shangbin Feng
Zhaoxuan Tan
Heng Wang
Yulia Tsvetkov
Minnan Luo
137
34
0
16 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
208
41
0
02 Feb 2024
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
399
338
0
18 Jan 2024
The Critique of Critique
Shichao Sun
Junlong Li
Weizhe Yuan
Ruifeng Yuan
Wenjie Li
Pengfei Liu
ELM
71
0
0
09 Jan 2024
LitSumm: Large language models for literature summarisation of non-coding RNAs
Andrew Green
C. Ribas
Nancy Ontiveros-Palacios
Sam Griffiths-Jones
Anton I. Petrov
Alex Bateman
Blake Sweeney
123
4
0
06 Nov 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
156
142
0
26 Oct 2023
Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges
Hiba Ahsan
Denis Jered McInerney
Jisoo Kim
Christopher Potter
Geoffrey S. Young
Silvio Amir
Byron C. Wallace
63
12
0
08 Sep 2023
Previous
1
2