Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05685
Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"
50 / 3,057 papers shown
Title
Secrets of RLHF in Large Language Models Part I: PPO
Rui Zheng
Shihan Dou
Songyang Gao
Yuan Hua
Wei Shen
...
Hang Yan
Tao Gui
Qi Zhang
Xipeng Qiu
Xuanjing Huang
ALM
OffRL
55
160
0
11 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
92
228
0
07 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
85
1,578
0
06 Jul 2023
Style Over Substance: Evaluation Biases for Large Language Models
Minghao Wu
Alham Fikri Aji
ALM
ELM
59
44
0
06 Jul 2023
What Should Data Science Education Do with Large Language Models?
Xinming Tu
James Zou
Weijie J. Su
Linjun Zhang
AI4Ed
47
35
0
06 Jul 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Deepanway Ghosal
Yew Ken Chia
Navonil Majumder
Soujanya Poria
ALM
LRM
38
17
0
05 Jul 2023
Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
Jinhao Duan
Hao-Ran Cheng
Shiqi Wang
Alex Zavalny
Chenan Wang
Renjing Xu
B. Kailkhura
Kaidi Xu
61
40
0
03 Jul 2023
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
49
43
0
03 Jul 2023
Preference Ranking Optimization for Human Alignment
Feifan Song
Yu Bowen
Minghao Li
Haiyang Yu
Fei Huang
Yongbin Li
Houfeng Wang
ALM
34
243
0
30 Jun 2023
On the Exploitability of Instruction Tuning
Manli Shu
Jiong Wang
Chen Zhu
Jonas Geiping
Chaowei Xiao
Tom Goldstein
SILM
64
93
0
28 Jun 2023
Composing Parameter-Efficient Modules with Arithmetic Operations
Jinghan Zhang
Shiqi Chen
Junteng Liu
Junxian He
KELM
MoMe
40
114
0
26 Jun 2023
H
2
_2
2
O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
74
268
0
24 Jun 2023
Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping
Daniel Zou
X. Jin
Xueyang Yu
Haotian Zhang
J. Demmel
MoE
57
0
0
24 Jun 2023
Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities
Xudong Shen
H. Brown
Jiashu Tao
Martin Strobel
Yao Tong
Akshay Narayan
Harold Soh
Finale Doshi-Velez
49
3
0
22 Jun 2023
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
Shizhe Diao
Rui Pan
Hanze Dong
Kashun Shum
Jipeng Zhang
Wei Xiong
Tong Zhang
ALM
42
64
0
21 Jun 2023
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts
Xuan-Phi Nguyen
Sharifah Mahani Aljunied
Shafiq Joty
Lidong Bing
55
33
0
20 Jun 2023
CHORUS: Foundation Models for Unified Data Discovery and Exploration
Moe Kayali
A. Lykov
Ilias Fountalis
N. Vasiloglou
Dan Olteanu
Dan Suciu
46
22
0
16 Jun 2023
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
...
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
ELM
ALM
44
67
0
15 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
44
162
0
15 Jun 2023
MiniLLM: Knowledge Distillation of Large Language Models
Yuxian Gu
Li Dong
Furu Wei
Minlie Huang
ALM
58
77
0
14 Jun 2023
Model Spider: Learning to Rank Pre-Trained Models Efficiently
Yi-Kai Zhang
Ting Huang
Yao-Xiang Ding
De-Chuan Zhan
Han-Jia Ye
46
25
0
06 Jun 2023
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
Dongfu Jiang
Xiang Ren
Bill Yuchen Lin
ELM
27
288
0
05 Jun 2023
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
Shalev Lifshitz
Keiran Paster
Harris Chan
Jimmy Ba
Sheila A. McIlraith
LM&Ro
45
69
0
01 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MA
MedIm
65
728
0
01 Jun 2023
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap
Q. V. Liao
Ziang Xiao
ALM
ELM
73
32
0
01 Jun 2023
Large Language Models are not Fair Evaluators
Peiyi Wang
Lei Li
Liang Chen
Zefan Cai
Dawei Zhu
Binghuai Lin
Yunbo Cao
Qi Liu
Tianyu Liu
Zhifang Sui
ALM
58
538
0
29 May 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
67
196
0
29 May 2023
Lawyer LLaMA Technical Report
Quzhe Huang
Mingxu Tao
Chen Zhang
Zhenwei An
Cong Jiang
Zhibin Chen
Zirui Wu
Yansong Feng
ELM
ALM
AILaw
58
50
0
24 May 2023
In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Leonard Salewski
Stephan Alaniz
Isabel Rio-Torto
Eric Schulz
Zeynep Akata
49
152
0
24 May 2023
Automatic Model Selection with Large Language Models for Reasoning
Xu Zhao
Yuxi Xie
Kenji Kawaguchi
Junxian He
Qizhe Xie
ReLM
LRM
42
40
0
23 May 2023
Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation
Da Yin
Xiao Liu
Fan Yin
Ming Zhong
Hritik Bansal
Jiawei Han
Kai-Wei Chang
ALM
42
37
0
23 May 2023
QTSumm: Query-Focused Summarization over Tabular Data
Yilun Zhao
Zhenting Qi
Linyong Nan
Boyu Mi
Yixin Liu
...
Ruizhe Chen
Xiangru Tang
Yumo Xu
Dragomir R. Radev
Arman Cohan
RALM
LMTD
48
1
0
23 May 2023
INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback
Wenda Xu
Danqing Wang
Liangming Pan
Zhenqiao Song
Markus Freitag
Wenjie Wang
Lei Li
ALM
ELM
46
18
0
23 May 2023
Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting
Rui Wang
Hongru Wang
Fei Mi
Yi Chen
Boyang Xue
Kam-Fai Wong
Rui-Lan Xu
53
14
0
23 May 2023
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
Jian Xie
Kai Zhang
Jiangjie Chen
Renze Lou
Yu-Chuan Su
RALM
230
160
0
22 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
53
562
0
22 May 2023
CLASS: A Design Framework for building Intelligent Tutoring Systems based on Learning Science principles
Shashank Sonkar
Lucy Liu
D. B. Mallick
Richard G. Baraniuk
80
39
0
22 May 2023
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection
Xiao Yu
Yuang Qi
Kejiang Chen
Guoqiang Chen
Xi Yang
Pengyuan Zhu
Xiuwei Shang
Weiming Zhang
Neng H. Yu
DeLMO
30
11
0
21 May 2023
Evaluating the Performance of Large Language Models on GAOKAO Benchmark
Xiaotian Zhang
Chun-yan Li
Yi Zong
Zhengyu Ying
Liang He
Xipeng Qiu
ALM
ELM
45
101
0
21 May 2023
InstructIE: A Bilingual Instruction-based Information Extraction Dataset
Honghao Gui
Shuofei Qiao
Jintian Zhang
Hongbin Ye
Mengshu Sun
Lei Liang
Jeff Z. Pan
Huajun Chen
Ningyu Zhang
39
7
0
19 May 2023
Automatic Evaluation of Attribution by Large Language Models
Xiang Yue
Boshi Wang
Ziru Chen
Kai Zhang
Yu-Chuan Su
Huan Sun
ALM
LRM
HILM
46
56
0
10 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
233
586
0
03 May 2023
A Comprehensive Evaluation of Neural SPARQL Query Generation from Natural Language Questions
Papa Abdou Karim Karou Diallo
Samuel Reyd
Amal Zouaq
18
7
0
16 Apr 2023
Multi-step Jailbreaking Privacy Attacks on ChatGPT
Haoran Li
Dadi Guo
Wei Fan
Mingshi Xu
Jie Huang
Fanpu Meng
Yangqiu Song
SILM
75
329
0
11 Apr 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
175
599
0
06 Apr 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
120
1,118
0
29 Mar 2023
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models
Qingyu Lu
Baopu Qiu
Liang Ding
Liping Xie
Tom Kocmi
Dacheng Tao
LRM
ALM
ELM
31
111
0
24 Mar 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
445
2,232
0
22 Mar 2023
SAINE: Scientific Annotation and Inference Engine of Scientific Research
Susie Xi Rao
Yi-Lin Tu
P. Egger
32
1
0
28 Feb 2023
Guiding Large Language Models via Directional Stimulus Prompting
Zekun Li
Baolin Peng
Pengcheng He
Michel Galley
Jianfeng Gao
Xi Yan
LLMAG
LRM
LM&Ro
45
96
0
22 Feb 2023
Previous
1
2
3
...
60
61
62
Next