ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 2,990 papers shown
Title
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
29
181
0
26 Sep 2023
Integration of Large Language Models within Cognitive Architectures for
  Autonomous Robots
Integration of Large Language Models within Cognitive Architectures for Autonomous Robots
Miguel Ángel González Santamarta
Francisco J. Rodríguez-Lera
Ángel Manuel Guerrero Higueras
Vicente Matellán Olivera
LLMAG
LM&Ro
34
5
0
26 Sep 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Aligning Large Multimodal Models with Factually Augmented RLHF
Zhiqing Sun
Sheng Shen
Shengcao Cao
Haotian Liu
Chunyuan Li
...
Liangyan Gui
Yu-Xiong Wang
Yiming Yang
Kurt Keutzer
Trevor Darrell
VLM
52
325
0
25 Sep 2023
Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Yangjun Ruan
Honghua Dong
Andrew Wang
Silviu Pitis
Yongchao Zhou
Jimmy Ba
Yann Dubois
Chris J. Maddison
Tatsunori Hashimoto
LLMAG
ELM
25
100
0
25 Sep 2023
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level
  Vision
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision
Haoning Wu
Zicheng Zhang
Erli Zhang
Chaofeng Chen
Liang Liao
...
Chunyi Li
Wenxiu Sun
Qiong Yan
Guangtao Zhai
Weisi Lin
VLM
42
136
0
25 Sep 2023
Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance
  Classification
Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification
Iain J. Cruickshank
Lynnette Hui Xian Ng
49
9
0
24 Sep 2023
ALLURE: Auditing and Improving LLM-based Evaluation of Text using
  Iterative In-Context-Learning
ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Hosein Hasanbeig
Hiteshi Sharma
Leo Betthauser
Felipe Vieira Frujeri
Ida Momennejad
59
15
0
24 Sep 2023
EvalLM: Interactive Evaluation of Large Language Model Prompts on
  User-Defined Criteria
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Tae Soo Kim
Yoonjoo Lee
Jamin Shin
Young-Ho Kim
Juho Kim
43
71
0
24 Sep 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling
  Capacities of Large Language Models
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALM
ALM
47
35
0
23 Sep 2023
From Text to Source: Results in Detecting Large Language Model-Generated
  Content
From Text to Source: Results in Detecting Large Language Model-Generated Content
Wissam Antoun
Benoît Sagot
Djamé Seddah
DeLMO
39
11
0
23 Sep 2023
Calibrating LLM-Based Evaluator
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
49
31
0
23 Sep 2023
AI Risk Profiles: A Standards Proposal for Pre-Deployment AI Risk
  Disclosures
AI Risk Profiles: A Standards Proposal for Pre-Deployment AI Risk Disclosures
E. Sherman
Ian W. Eisenberg
41
5
0
22 Sep 2023
ReConcile: Round-Table Conference Improves Reasoning via Consensus among
  Diverse LLMs
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Justin Chih-Yao Chen
Swarnadeep Saha
Joey Tianyi Zhou
LLMAG
LRM
54
126
0
22 Sep 2023
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems
Leonardo Ranaldi
Fabio Massimo Zanzotto
39
2
0
21 Sep 2023
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Tianle Li
Siyuan Zhuang
...
Zi Lin
Eric P. Xing
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
38
183
0
21 Sep 2023
"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure
  Risks and Benefits When Using LLM-Based Conversational Agents
"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents
Zhiping Zhang
Michelle Jia
Hao-Ping Lee
Bingsheng Yao
Sauvik Das
Ada Lerner
Dakuo Wang
Tianshi Li
SILM
ELM
24
73
0
20 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
50
180
0
20 Sep 2023
Studying Lobby Influence in the European Parliament
Studying Lobby Influence in the European Parliament
Aswin Suresh
Lazar Radojević
Francesco Salvi
Antoine Magron
Victor Kristof
Matthias Grossglauser
28
0
0
20 Sep 2023
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal
  Services
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services
Shengbin Yue
Wei Chen
Siyuan Wang
Bingxuan Li
Chenchen Shen
...
Yuxuan Zhou
Yao Xiao
Song Yun
Xuanjing Huang
Zhongyu Wei
AILaw
ELM
56
90
0
20 Sep 2023
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Guan-Bo Wang
Sijie Cheng
Xianyuan Zhan
Xiangang Li
Sen Song
Yang Liu
ALM
51
235
0
20 Sep 2023
Are Large Language Models Really Robust to Word-Level Perturbations?
Are Large Language Models Really Robust to Word-Level Perturbations?
Haoyu Wang
Guozheng Ma
Cong Yu
Ning Gui
Linrui Zhang
...
Sen Zhang
Li Shen
Xueqian Wang
Peilin Zhao
Dacheng Tao
KELM
36
22
0
20 Sep 2023
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model
  Pre-trained from Scratch
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch
Juntao Li
Zecheng Tang
Yuyang Ding
Pinzheng Wang
Pei Guo
...
Wenliang Chen
Guohong Fu
Qiaoming Zhu
Guodong Zhou
Hao Fei
59
5
0
19 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
133
143
0
19 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
124
311
0
19 Sep 2023
LLM4Jobs: Unsupervised occupation extraction and standardization
  leveraging Large Language Models
LLM4Jobs: Unsupervised occupation extraction and standardization leveraging Large Language Models
Nan Li
Bo Kang
T. D. Bie
39
1
0
18 Sep 2023
Embrace Divergence for Richer Insights: A Multi-document Summarization
  Benchmark and a Case Study on Summarizing Diverse Information from News
  Articles
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles
Kung-Hsiang Huang
Philippe Laban
Alexander R. Fabbri
Prafulla Kumar Choubey
Shafiq Joty
Caiming Xiong
Chien-Sheng Wu
38
26
0
17 Sep 2023
OWL: A Large Language Model for IT Operations
OWL: A Large Language Model for IT Operations
Hongcheng Guo
Jian Yang
Jiaheng Liu
Liqun Yang
Linzheng Chai
...
Tieqiao Zheng
Liangfan Zheng
Bo Zhang
Ke Xu
Zhoujun Li
VLM
68
41
0
17 Sep 2023
Can Large Language Models Understand Real-World Complex Instructions?
Can Large Language Models Understand Real-World Complex Instructions?
Qi He
Jie Zeng
Wenhao Huang
Lina Chen
Jin Xiao
...
Shisong Chen
Yikai Zhang
Zhouhong Gu
Jiaqing Liang
Yanghua Xiao
ALM
LRM
ELM
98
52
0
17 Sep 2023
Monolingual or Multilingual Instruction Tuning: Which Makes a Better
  Alpaca
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
Pinzhen Chen
Shaoxiong Ji
Nikolay Bogoychev
Andrey Kutuzov
Barry Haddow
Kenneth Heafield
57
45
0
16 Sep 2023
PDFTriage: Question Answering over Long, Structured Documents
PDFTriage: Question Answering over Long, Structured Documents
Jon Saad-Falcon
Joe Barrow
Alexa F. Siu
A. Nenkova
David Seunghyun Yoon
Ryan Rossi
Franck Dernoncourt
RALM
33
20
0
16 Sep 2023
Learning by Self-Explaining
Learning by Self-Explaining
Wolfgang Stammer
Felix Friedrich
David Steinmann
Manuel Brack
Hikaru Shindo
Kristian Kersting
59
7
0
15 Sep 2023
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language
  Models that Follow Instructions
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions
Federico Bianchi
Mirac Suzgun
Giuseppe Attanasio
Paul Röttger
Dan Jurafsky
Tatsunori Hashimoto
James Zou
ALM
LM&MA
LRM
39
193
0
14 Sep 2023
Zero-shot Audio Topic Reranking using Large Language Models
Zero-shot Audio Topic Reranking using Large Language Models
Mengjie Qian
Rao Ma
Adian Liusie
Erfan Loweimi
Kate Knill
Mark Gales
42
1
0
14 Sep 2023
Are Large Language Model-based Evaluators the Solution to Scaling Up
  Multilingual Evaluation?
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Rishav Hada
Varun Gumma
Adrian de Wynter
Harshita Diddee
Mohamed Ahmed
Monojit Choudhury
Kalika Bali
Sunayana Sitaram
ALM
LM&MA
ELM
35
63
0
14 Sep 2023
Adapted Large Language Models Can Outperform Medical Experts in Clinical
  Text Summarization
Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization
Dave Van Veen
Cara Van Uden
Louis Blankemeier
Jean-Benoit Delbrouck
Asad Aali
...
C. Langlotz
Jason Hom
S. Gatidis
John M. Pauly
Akshay S. Chaudhari
ELM
AI4MH
LM&MA
65
289
0
14 Sep 2023
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness
  and Ethics
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Haoqin Tu
Bingchen Zhao
Chen Wei
Cihang Xie
MLLM
46
14
0
13 Sep 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
53
77
0
13 Sep 2023
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation
  Suite for Large Language Models
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Wei Qi Leong
Jian Gang Ngui
Yosephine Susanto
Hamsawardhini Rengarajan
Kengatharaiyer Sarveswaran
William-Chandra Tjhi
40
9
0
12 Sep 2023
MAmmoTH: Building Math Generalist Models through Hybrid Instruction
  Tuning
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Xiang Yue
Xingwei Qu
Ge Zhang
Yao Fu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
AIMat
LRM
85
372
0
11 Sep 2023
Textbooks Are All You Need II: phi-1.5 technical report
Textbooks Are All You Need II: phi-1.5 technical report
Yuan-Fang Li
Sébastien Bubeck
Ronen Eldan
Allison Del Giorno
Suriya Gunasekar
Yin Tat Lee
ALM
LRM
59
451
0
11 Sep 2023
Decolonial AI Alignment: Openness, Viśe\d{s}a-Dharma, and Including
  Excluded Knowledges
Decolonial AI Alignment: Openness, Viśe\d{s}a-Dharma, and Including Excluded Knowledges
Kush R. Varshney
49
2
0
10 Sep 2023
Leveraging Large Language Models for Exploiting ASR Uncertainty
Leveraging Large Language Models for Exploiting ASR Uncertainty
Pranay Dighe
Yi Su
Shangshang Zheng
Yunshu Liu
Vineet Garg
Xiaochuan Niu
Ahmed H. Tewfik
18
12
0
09 Sep 2023
SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment
  to Cultural Reasoning
SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning
Bin Wang
Zhengyuan Liu
Xin Huang
Fangkai Jiao
Yang Ding
Ai Ti Aw
Nancy F. Chen
LRM
37
63
0
09 Sep 2023
Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation
Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation
Jiatong Li
Rui Li
Qi Liu
39
15
0
08 Sep 2023
NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus
NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus
Kyoungyeon Cho
Seungkum Han
Young Rok Choi
Wonseok Hwang
ELM
AILaw
39
0
0
08 Sep 2023
FIND: A Function Description Benchmark for Evaluating Interpretability
  Methods
FIND: A Function Description Benchmark for Evaluating Interpretability Methods
Sarah Schwettmann
Tamar Rott Shaham
Joanna Materzyñska
Neil Chowdhury
Shuang Li
Jacob Andreas
David Bau
Antonio Torralba
18
20
0
07 Sep 2023
Large Language Models Are Not Robust Multiple Choice Selectors
Large Language Models Are Not Robust Multiple Choice Selectors
Chujie Zheng
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
41
218
0
07 Sep 2023
FLM-101B: An Open LLM and How to Train It with $100K Budget
FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
Li Du
Bowen Qin
Zheng Zhang
Aixin Sun
Yequan Wang
60
22
0
07 Sep 2023
Evaluating ChatGPT as a Recommender System: A Rigorous Approach
Evaluating ChatGPT as a Recommender System: A Rigorous Approach
Dario Di Palma
Giovanni Maria Biancofiore
Vito Walter Anelli
Fedelucio Narducci
Tommaso Di Noia
E. Sciascio
ALM
59
28
0
07 Sep 2023
XGen-7B Technical Report
XGen-7B Technical Report
Erik Nijkamp
Tian Xie
Hiroaki Hayashi
Bo Pang
Congying Xia
...
Chien-Sheng Wu
Silvio Savarese
Yingbo Zhou
Shafiq Joty
Caiming Xiong
ALM
34
13
0
07 Sep 2023
Previous
123...5657585960
Next