ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 3,054 papers shown
Title
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Dingli Yu
Simran Kaur
Arushi Gupta
Jonah Brown-Cohen
Anirudh Goyal
Sanjeev Arora
ALM
LLMAG
29
37
0
26 Oct 2023
ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in
  Real-World User-AI Conversation
ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation
Zi Lin
Zihan Wang
Yongqi Tong
Yangkun Wang
Yuxin Guo
Yujia Wang
Jingbo Shang
AI4MH
18
97
0
26 Oct 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
78
123
0
26 Oct 2023
Zephyr: Direct Distillation of LM Alignment
Zephyr: Direct Distillation of LM Alignment
Lewis Tunstall
E. Beeching
Nathan Lambert
Nazneen Rajani
Kashif Rasul
...
Nathan Habib
Nathan Sarrazin
Omar Sanseviero
Alexander M. Rush
Thomas Wolf
ALM
43
381
0
25 Oct 2023
OccuQuest: Mitigating Occupational Bias for Inclusive Large Language
  Models
OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models
Mingfeng Xue
Dayiheng Liu
Kexin Yang
Guanting Dong
Wenqiang Lei
Zheng Yuan
Chang Zhou
Jingren Zhou
LLMAG
37
2
0
25 Oct 2023
Evaluating, Understanding, and Improving Constrained Text Generation for
  Large Language Models
Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models
Xiang Chen
Xiaojun Wan
32
0
0
25 Oct 2023
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with
  Tool-Augmented Large Language Models
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models
Zefan Wang
Zichuan Liu
Yingying Zhang
Aoxiao Zhong
Lunting Fan
Lingfei Wu
Qingsong Wen
64
25
0
25 Oct 2023
CycleAlign: Iterative Distillation from Black-box LLM to White-box
  Models for Better Human Alignment
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Jixiang Hong
Quan Tu
Cai Chen
Xing Gao
Ji Zhang
Rui Yan
ALM
39
11
0
25 Oct 2023
Can You Follow Me? Testing Situational Understanding in ChatGPT
Can You Follow Me? Testing Situational Understanding in ChatGPT
Chenghao Yang
Allyson Ettinger
LRM
LLMAG
ELM
123
4
0
24 Oct 2023
NoteChat: A Dataset of Synthetic Doctor-Patient Conversations
  Conditioned on Clinical Notes
NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes
Junda Wang
Zonghai Yao
Zhichao Yang
Huixue Zhou
Rumeng Li
Xun Wang
Yucheng Xu
Hong-ye Yu
LM&MA
40
22
0
24 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
46
52
0
23 Oct 2023
Branch-Solve-Merge Improves Large Language Model Evaluation and
  Generation
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Swarnadeep Saha
Omer Levy
Asli Celikyilmaz
Mohit Bansal
Jason Weston
Xian Li
MoMe
48
73
0
23 Oct 2023
Exploring the Boundaries of GPT-4 in Radiology
Exploring the Boundaries of GPT-4 in Radiology
Qianchu Liu
Stephanie L. Hyland
Shruthi Bannur
Kenza Bouzid
Daniel Coelho De Castro
...
Anja Thieme
A. Nori
M. Lungren
Ozan Oktay
Javier Alvarez-Valle
LM&MA
AI4CE
56
37
0
23 Oct 2023
Which Prompts Make The Difference? Data Prioritization For Efficient
  Human LLM Evaluation
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Marzieh Fadaee
Sara Hooker
ALM
41
18
0
22 Oct 2023
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden
  Harms and Biases
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
57
17
0
22 Oct 2023
Teaching Language Models to Self-Improve through Interactive
  Demonstrations
Teaching Language Models to Self-Improve through Interactive Demonstrations
Xiao Yu
Baolin Peng
Michel Galley
Jianfeng Gao
Zhou Yu
LRM
ReLM
53
20
0
20 Oct 2023
Explicit Alignment and Many-to-many Entailment Based Reasoning for
  Conversational Machine Reading
Explicit Alignment and Many-to-many Entailment Based Reasoning for Conversational Machine Reading
Yangyang Luo
Shiyu Tian
Caixia Yuan
Fangkun Zhao
38
1
0
20 Oct 2023
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu
Keyi Kong
Ning Liu
Li-zhen Cui
Di Wang
Jingfeng Zhang
Mohan Kankanhalli
AAML
SILM
67
73
0
20 Oct 2023
An Emulator for Fine-Tuning Large Language Models using Small Language
  Models
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Eric Mitchell
Rafael Rafailov
Archit Sharma
Chelsea Finn
Christopher D. Manning
ALM
46
53
0
19 Oct 2023
AgentTuning: Enabling Generalized Agent Abilities for LLMs
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Aohan Zeng
Mingdao Liu
Rui Lu
Bowen Wang
Xiao Liu
Yuxiao Dong
Jie Tang
LM&MA
ALM
LLMAG
32
166
0
19 Oct 2023
Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy
  Searcher
Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy Searcher
Xiang Shi
Jiawei Liu
Yinpeng Liu
Qikai Cheng
Wei Lu
RALM
HILM
KELM
29
6
0
19 Oct 2023
Large Language Models for Code Analysis: Do LLMs Really Do Their Job?
Large Language Models for Code Analysis: Do LLMs Really Do Their Job?
Chongzhou Fang
Ning Miao
Shaurya Srivastav
Jialin Liu
Ruoyu Zhang
...
Asmita Asmita
Ryan Tsang
Najmeh Nazari
Han Wang
Houman Homayoun
38
41
0
18 Oct 2023
InferDPT: Privacy-Preserving Inference for Black-box Large Language
  Model
InferDPT: Privacy-Preserving Inference for Black-box Large Language Model
Meng Tong
Kejiang Chen
Jie Zhang
Yuang Qi
Weiming Zhang
Neng H. Yu
Tianwei Zhang
Zhikun Zhang
SILM
58
2
0
18 Oct 2023
Improving Generalization of Alignment with Human Preferences through
  Group Invariant Learning
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
Rui Zheng
Wei Shen
Yuan Hua
Wenbin Lai
Shihan Dou
...
Xiao Wang
Haoran Huang
Tao Gui
Qi Zhang
Xuanjing Huang
61
14
0
18 Oct 2023
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Heng-Chiao Huang
Jiuxiang Gu
Dinesh Manocha
113
24
0
18 Oct 2023
Personalized Soups: Personalized Large Language Model Alignment via
  Post-hoc Parameter Merging
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
Joel Jang
Seungone Kim
Bill Yuchen Lin
Yizhong Wang
Jack Hessel
Luke Zettlemoyer
Hannaneh Hajishirzi
Yejin Choi
Prithviraj Ammanabrolu
MoMe
74
137
0
17 Oct 2023
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Yaofang Liu
Xiaodong Cun
Xuebo Liu
Xintao Wang
Yong Zhang
Haoxin Chen
Yang Liu
Tieyong Zeng
Raymond H. F. Chan
Ying Shan
VGen
EGVM
41
131
0
17 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of
  Multi-modal Large Models
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
38
5
0
17 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
172
155
0
16 Oct 2023
OpenAgents: An Open Platform for Language Agents in the Wild
OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie
Fan Zhou
Zhoujun Cheng
Peng Shi
Luoxuan Weng
...
Yiheng Xu
Hongjin Su
Dongchan Shin
Caiming Xiong
Tao Yu
LLMAG
58
93
0
16 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
35
56
0
16 Oct 2023
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake
  Analysis
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
Kai Chen
Chunwei Wang
Kuo Yang
Jianhua Han
Lanqing Hong
...
Zhenguo Li
Dit-Yan Yeung
Lifeng Shang
Xin Jiang
Qun Liu
78
33
0
16 Oct 2023
Theory of Mind for Multi-Agent Collaboration via Large Language Models
Theory of Mind for Multi-Agent Collaboration via Large Language Models
Huao Li
Yu Quan Chong
Simon Stepputtis
Joseph Campbell
Dana Hughes
Michael Lewis
Katia Sycara
LLMAG
36
64
0
16 Oct 2023
Verbosity Bias in Preference Labeling by Large Language Models
Verbosity Bias in Preference Labeling by Large Language Models
Keita Saito
Akifumi Wachi
Koki Wataoka
Youhei Akimoto
ALM
18
31
0
16 Oct 2023
ACES: Generating Diverse Programming Puzzles with with Autotelic
  Generative Models
ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models
Julien Pourcel
Cédric Colas
Gaia Molinaro
Pierre-Yves Oudeyer
Laetitia Teodorescu
46
2
0
15 Oct 2023
Assessing the Reliability of Large Language Model Knowledge
Assessing the Reliability of Large Language Model Knowledge
Weixuan Wang
Barry Haddow
Alexandra Birch
Wei Peng
KELM
HILM
80
15
0
15 Oct 2023
How Good is ChatGPT in Giving Advice on Your Visualization Design?
How Good is ChatGPT in Giving Advice on Your Visualization Design?
Nam Wook Kim
Grace Myers
Benjamin Bach
33
21
0
14 Oct 2023
Instruction Tuning with Human Curriculum
Instruction Tuning with Human Curriculum
Bruce W. Lee
Hyunsoo Cho
Kang Min Yoo
52
3
0
14 Oct 2023
A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking
  with Large Language Models
A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
Shengyao Zhuang
Honglei Zhuang
Bevan Koopman
Guido Zuccon
55
25
0
14 Oct 2023
End-to-end Story Plot Generator
End-to-end Story Plot Generator
Hanlin Zhu
Andrew Cohen
Danqing Wang
Kevin Kaichuang Yang
Xiaomeng Yang
Jiantao Jiao
Yuandong Tian
32
5
0
13 Oct 2023
MemGPT: Towards LLMs as Operating Systems
MemGPT: Towards LLMs as Operating Systems
Charles Packer
Sarah Wooders
Kevin Lin
Vivian Fang
Shishir G. Patil
Ion Stoica
Joseph E. Gonzalez
RALM
49
129
0
12 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language
  Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
42
221
0
12 Oct 2023
Jailbreaking Black Box Large Language Models in Twenty Queries
Jailbreaking Black Box Large Language Models in Twenty Queries
Patrick Chao
Alexander Robey
Yan Sun
Hamed Hassani
George J. Pappas
Eric Wong
AAML
63
628
0
12 Oct 2023
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large
  Language Models
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Jing Liu
Ruihao Gong
Xiuying Wei
Zhiwei Dong
Jianfei Cai
Bohan Zhuang
MQ
37
52
0
12 Oct 2023
Harnessing Large Language Models' Empathetic Response Generation
  Capabilities for Online Mental Health Counselling Support
Harnessing Large Language Models' Empathetic Response Generation Capabilities for Online Mental Health Counselling Support
Siyuan Brandon Loh
Aravind Sesagiri Raamkumar
AI4MH
25
15
0
12 Oct 2023
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Qingyue Zhao
Banghua Zhu
49
4
0
11 Oct 2023
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Wei Ping
Ming-Yu Liu
Lawrence C. McAfee
Peng Xu
Bo Li
Mohammad Shoeybi
Bryan Catanzaro
RALM
50
48
0
11 Oct 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIP
VLM
42
28
0
11 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
61
175
0
11 Oct 2023
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large
  Language Models
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models
Yuhe Liu
Changhua Pei
Longlong Xu
Bohan Chen
Mingze Sun
...
Gaogang Xie
Xidao Wen
Xiaohui Nie
Minghua Ma
Dan Pei
ELM
22
2
0
11 Oct 2023
Previous
123...555657...606162
Next