ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 2,990 papers shown
Title
Parameter Efficient Audio Captioning With Faithful Guidance Using
  Audio-text Shared Latent Representation
Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
A. Sridhar
Yinyi Guo
Erik M. Visser
Rehana Mahfuz
68
5
0
06 Sep 2023
Zero-Resource Hallucination Prevention for Large Language Models
Zero-Resource Hallucination Prevention for Large Language Models
Junyu Luo
Cao Xiao
Fenglong Ma
HILM
60
16
0
06 Sep 2023
CIEM: Contrastive Instruction Evaluation Method for Better Instruction
  Tuning
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Hongyu Hu
Jiyuan Zhang
Minyi Zhao
Zhenbang Sun
MLLM
30
43
0
05 Sep 2023
AGIBench: A Multi-granularity, Multimodal, Human-referenced,
  Auto-scoring Benchmark for Large Language Models
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models
Fei Tang
Wanling Gao
Luzhou Peng
Jianfeng Zhan
ELM
22
2
0
05 Sep 2023
Making Large Language Models Better Reasoners with Alignment
Making Large Language Models Better Reasoners with Alignment
Peiyi Wang
Lei Li
Liang Chen
Feifan Song
Binghuai Lin
Yunbo Cao
Tianyu Liu
Zhifang Sui
ALM
LRM
55
65
0
05 Sep 2023
CodeApex: A Bilingual Programming Evaluation Benchmark for Large
  Language Models
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
Lingyue Fu
Huacan Chai
Shuang Luo
Kounianhua Du
Weiming Zhang
...
Jingkuan Wang
Siyuan Qi
Kangning Zhang
Weinan Zhang
Yong Yu
ELM
42
9
0
05 Sep 2023
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
Raz Lapid
Ron Langberg
Moshe Sipper
AAML
39
108
0
04 Sep 2023
ModelScope-Agent: Building Your Customizable Agent System with
  Open-source Large Language Models
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
Chenliang Li
Hehong Chen
Mingshi Yan
Weizhou Shen
Haiyang Xu
...
Chen Cheng
Hongzhu Shi
Ji Zhang
Fei Huang
Jingren Zhou
LLMAG
40
20
0
02 Sep 2023
TouchStone: Evaluating Vision-Language Models by Language Models
TouchStone: Evaluating Vision-Language Models by Language Models
Shuai Bai
Shusheng Yang
Jinze Bai
Peng Wang
Xing Zhang
Junyang Lin
Xinggang Wang
Chang Zhou
Jingren Zhou
MLLM
45
44
0
31 Aug 2023
Recommender AI Agent: Integrating Large Language Models for Interactive
  Recommendations
Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
Xu Huang
Jianxun Lian
Yuxuan Lei
Jing Yao
Defu Lian
Xing Xie
LLMAG
31
91
0
31 Aug 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
41
22
0
31 Aug 2023
Peering Through Preferences: Unraveling Feedback Acquisition for
  Aligning Large Language Models
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
Hritik Bansal
John Dang
Aditya Grover
ALM
58
20
0
30 Aug 2023
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
Dawei Gao
Haibin Wang
Yaliang Li
Xiuyu Sun
Yichen Qian
Bolin Ding
Jingren Zhou
AI4TS
69
254
0
29 Aug 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context
  Understanding
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAG
RALM
31
512
0
28 Aug 2023
ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large
  Language Models
ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models
Baolin Zhang
Hai-Yong Xie
Pengfan Du
Junhao Chen
Pengfei Cao
Yubo Chen
Shengping Liu
Kang Liu
Jun Zhao
ELM
ALM
29
2
0
28 Aug 2023
DISC-MedLLM: Bridging General Large Language Models and Real-World
  Medical Consultation
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation
Zhijie Bao
Wei Chen
Shengze Xiao
Kuang Ren
Jiaao Wu
Cheng Zhong
J. Peng
Xuanjing Huang
Zhongyu Wei
LM&MA
50
73
0
28 Aug 2023
Evaluating the Robustness to Instructions of Large Language Models
Yuansheng Ni
Sichao Jiang
Xinyu Wu
Hui Shen
Yuli Zhou
ALM
30
2
0
28 Aug 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
64
4
0
28 Aug 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on
  Language, Multimodal, and Scientific GPT Models
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Kaiyuan Gao
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
44
4
0
27 Aug 2023
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for
  Scientific Research
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun
Yang Han
Zihan Zhao
Da Ma
Zhe-Wei Shen
Baocai Chen
Lu Chen
Kai Yu
ELM
45
76
0
25 Aug 2023
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language
  Models
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Wenqi Shao
Mengzhao Chen
Zhaoyang Zhang
Peng Xu
Lirui Zhao
Zhiqiang Li
Kaipeng Zhang
Peng Gao
Yu Qiao
Ping Luo
MQ
46
182
0
25 Aug 2023
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
Haibo Jin
Haoxuan Che
Yi Lin
Haoxing Chen
MedIm
47
61
0
24 Aug 2023
Aligning Language Models with Offline Learning from Human Feedback
Aligning Language Models with Offline Learning from Human Feedback
Jian Hu
Li Tao
J. Yang
Chandler Zhou
ALM
OffRL
48
7
0
23 Aug 2023
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data
  Selection for Instruction Tuning
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
Ming Li
Yong Zhang
Zhitao Li
Jiuhai Chen
Lichang Chen
Ning Cheng
Jianzong Wang
Dinesh Manocha
Jing Xiao
48
180
0
23 Aug 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment
  Goals for Big Models
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Jindong Wang
Xing Xie
ALM
46
42
0
23 Aug 2023
Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog
  Navigation
Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation
Yi-Chiao Su
Dongyan An
Yuan Xu
Kehan Chen
Yan Huang
60
2
0
22 Aug 2023
Giraffe: Adventures in Expanding Context Lengths in LLMs
Giraffe: Adventures in Expanding Context Lengths in LLMs
Arka Pal
Deep Karkhanis
Manley Roberts
Samuel Dooley
Arvind Sundararajan
Siddartha Naidu
50
40
0
21 Aug 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Fei Wu
Guoyin Wang
LM&MA
29
554
0
21 Aug 2023
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence
  Understanding
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding
Tianyu Yu
Chengyue Jiang
Chao Lou
Shen Huang
Xiaobin Wang
...
Haitao Zheng
Ningyu Zhang
Pengjun Xie
Fei Huang
Yong Jiang
LRM
64
17
0
21 Aug 2023
PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
Chuyi Kong
Yaxin Fan
Xiang Wan
Feng Jiang
Benyou Wang
47
8
0
21 Aug 2023
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
Zhuolun He
Haoyuan Wu
Xinyun Zhang
Xufeng Yao
Su Zheng
Haisheng Zheng
Bei Yu
LLMAG
40
104
0
20 Aug 2023
UniDoc: A Universal Large Multimodal Model for Simultaneous Text
  Detection, Recognition, Spotting and Understanding
UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Hao Feng
Zijian Wang
Jingqun Tang
Jinghui Lu
Wen-gang Zhou
Houqiang Li
Can Huang
MLLM
VLM
73
49
0
19 Aug 2023
GameEval: Evaluating LLMs on Conversational Games
GameEval: Evaluating LLMs on Conversational Games
Dan Qiao
Chenfei Wu
Yaobo Liang
Juntao Li
Nan Duan
ELM
LLMAG
35
21
0
19 Aug 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual
  Questions
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu
Y. Xu
Yuante Li
W. Li
Zhe Chen
Zhuowen Tu
MLLM
VLM
35
123
0
19 Aug 2023
Red-Teaming Large Language Models using Chain of Utterances for
  Safety-Alignment
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Rishabh Bhardwaj
Soujanya Poria
ELM
33
136
0
18 Aug 2023
End-to-End Beam Retrieval for Multi-Hop Question Answering
End-to-End Beam Retrieval for Multi-Hop Question Answering
Jiahao Zhang
Hai-Feng Zhang
Dongmei Zhang
Yong Liu
Sheng Huang
RALM
33
24
0
17 Aug 2023
CMB: A Comprehensive Medical Benchmark in Chinese
CMB: A Comprehensive Medical Benchmark in Chinese
Xidong Wang
Guiming Hardy Chen
Dingjie Song
Zhiyi Zhang
Zhihong Chen
...
Feng Jiang
Jianquan Li
Xiang Wan
Benyou Wang
Haizhou Li
LM&MA
ELM
AI4MH
44
80
0
17 Aug 2023
Evaluating the Instruction-Following Robustness of Large Language Models
  to Prompt Injection
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Zekun Li
Baolin Peng
Pengcheng He
Xifeng Yan
ELM
SILM
AAML
41
24
0
17 Aug 2023
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain
  Conversation
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Junru Lu
Siyu An
Mingbao Lin
Gabriele Pergola
Yulan He
Di Yin
Xing Sun
Yunsheng Wu
49
32
0
16 Aug 2023
From Commit Message Generation to History-Aware Commit Message
  Completion
From Commit Message Generation to History-Aware Commit Message Completion
Aleksandra V. Eliseeva
Yaroslav Sokolov
Egor Bogomolov
Yaroslav Golubev
Danny Dig
T. Bryksin
35
20
0
15 Aug 2023
LLM-Mini-CEX: Automatic Evaluation of Large Language Model for
  Diagnostic Conversation
LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation
Xiaoming Shi
Jinfeng Xu
Jinru Ding
Jiali Pang
Sichen Liu
...
Lu Lu
Haihong Yang
Mingtao Hu
Tong Ruan
Shaoting Zhang
LM&MA
ELM
41
13
0
15 Aug 2023
A Survey on Model Compression for Large Language Models
A Survey on Model Compression for Large Language Models
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
45
200
0
15 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELM
LLMAG
ALM
43
453
0
14 Aug 2023
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of
  Large Language Models
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
Keming Lu
Hongyi Yuan
Zheng Yuan
Runji Lin
Junyang Lin
Chuanqi Tan
Chang Zhou
Jingren Zhou
ALM
LRM
35
65
0
14 Aug 2023
Self-Alignment with Instruction Backtranslation
Self-Alignment with Instruction Backtranslation
Xian Li
Ping Yu
Chunting Zhou
Timo Schick
Omer Levy
Luke Zettlemoyer
Jason Weston
M. Lewis
SyDa
31
128
0
11 Aug 2023
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
Zhiwei Liu
Weiran Yao
Jianguo Zhang
Le Xue
Shelby Heinecke
...
Ran Xu
P. Mùi
Haiquan Wang
Caiming Xiong
Silvio Savarese
LLMAG
44
84
0
11 Aug 2023
A Preliminary Study of the Intrinsic Relationship between Complexity and
  Alignment
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment
Ying Zhao
Yu Bowen
Binyuan Hui
Haiyang Yu
Fei Huang
Yongbin Li
N. Zhang
60
23
0
10 Aug 2023
LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking
LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking
Fahim Dalvi
Maram Hasanain
Sabri Boughorbel
Basel Mousi
Samir Abdaljalil
...
Hamdy Mubarak
Ahmed M. Ali
Majd Hawasly
Nadir Durrani
Firoj Alam
33
24
0
09 Aug 2023
CLEVA: Chinese Language Models EVAluation Platform
CLEVA: Chinese Language Models EVAluation Platform
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALM
ELM
41
10
0
09 Aug 2023
Generative Benchmark Creation for Table Union Search
Generative Benchmark Creation for Table Union Search
Koyena Pal
Aamod Khatiwada
Roee Shraga
Renée J. Miller
48
0
0
07 Aug 2023
Previous
123...57585960
Next