ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 3,057 papers shown
Title
Value FULCRA: Mapping Large Language Models to the Multidimensional
  Spectrum of Basic Human Values
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yifan Gong
Xing Xie
69
24
0
15 Nov 2023
PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
Zhihan Zhang
Dong-Ho Lee
Yuwei Fang
Wenhao Yu
Mengzhao Jia
Meng Jiang
Francesco Barbieri
ALM
58
29
0
15 Nov 2023
Safer-Instruct: Aligning Language Models with Automated Preference Data
Safer-Instruct: Aligning Language Models with Automated Preference Data
Taiwei Shi
Kai Chen
Jieyu Zhao
ALM
SyDa
40
25
0
15 Nov 2023
Predicting generalization performance with correctness discriminators
Predicting generalization performance with correctness discriminators
Yuekun Yao
Alexander Koller
59
1
0
15 Nov 2023
CodeScope: An Execution-based Multilingual Multitask Multidimensional
  Benchmark for Evaluating LLMs on Code Understanding and Generation
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
Weixiang Yan
Haitian Liu
Yunkun Wang
Yunzhe Li
Qian Chen
...
Tingyu Lin
Weishan Zhao
Li Zhu
Hari Sundaram
Shuiguang Deng
ELM
LRM
57
36
0
14 Nov 2023
REST: Retrieval-Based Speculative Decoding
REST: Retrieval-Based Speculative Decoding
Zhenyu He
Zexuan Zhong
Tianle Cai
Jason D. Lee
Di He
RALM
28
82
0
14 Nov 2023
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning
Shengguang Wu
Keming Lu
Benfeng Xu
Junyang Lin
Qi Su
Chang Zhou
SyDa
ALM
27
38
0
14 Nov 2023
Insights into Classifying and Mitigating LLMs' Hallucinations
Insights into Classifying and Mitigating LLMs' Hallucinations
Alessandro Bruno
P. Mazzeo
Aladine Chetouani
Marouane Tliba
M. A. Kerkouri
HILM
56
11
0
14 Nov 2023
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM
  Game
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
Pengyu Cheng
Yifan Yang
Jian Li
Yong Dai
Tianhao Hu
Peixin Cao
Nan Du
Xiaolong Li
35
29
0
14 Nov 2023
A Closer Look at the Self-Verification Abilities of Large Language
  Models in Logical Reasoning
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning
Ruixin Hong
Hongming Zhang
Xinyu Pang
Dong Yu
Changshui Zhang
LRM
54
25
0
14 Nov 2023
Fair Abstractive Summarization of Diverse Perspectives
Fair Abstractive Summarization of Diverse Perspectives
Yusen Zhang
Nan Zhang
Yixin Liu
Alexander R. Fabbri
Junru Liu
...
Caiming Xiong
Jieyu Zhao
Dragomir R. Radev
Kathleen McKeown
Rui Zhang
38
8
0
14 Nov 2023
InCA: Rethinking In-Car Conversational System Assessment Leveraging
  Large Language Models
InCA: Rethinking In-Car Conversational System Assessment Leveraging Large Language Models
Ken E. Friedl
Abbas Goher Khan
S. Sahoo
Md. Rony
Jana Germies
Christian Süß
37
3
0
13 Nov 2023
Speech-based Slot Filling using Large Language Models
Speech-based Slot Filling using Large Language Models
Guangzhi Sun
Shutong Feng
Dongcheng Jiang
Chao Zhang
Milica Gasic
P. Woodland
50
1
0
13 Nov 2023
LM-Polygraph: Uncertainty Estimation for Language Models
LM-Polygraph: Uncertainty Estimation for Language Models
Ekaterina Fadeeva
Roman Vashurin
Akim Tsvigun
Artem Vazhentsev
Sergey Petrakov
...
Elizaveta Goncharova
Alexander Panchenko
Maxim Panov
Timothy Baldwin
Artem Shelmanov
32
56
0
13 Nov 2023
Exploring the Factual Consistency in Dialogue Comprehension of Large
  Language Models
Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models
Shuaijie She
Shujian Huang
Xingyun Wang
Yanke Zhou
Jiajun Chen
ELM
HILM
30
0
0
13 Nov 2023
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language
  Models
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models
Shangqing Tu
Yuliang Sun
Yushi Bai
Jifan Yu
Lei Hou
Juanzi Li
WaLM
38
10
0
13 Nov 2023
Towards the Law of Capacity Gap in Distilling Language Models
Towards the Law of Capacity Gap in Distilling Language Models
Chen Zhang
Dawei Song
Zheyu Ye
Yan Gao
ELM
45
20
0
13 Nov 2023
Flames: Benchmarking Value Alignment of LLMs in Chinese
Flames: Benchmarking Value Alignment of LLMs in Chinese
Kexin Huang
Xiangyang Liu
Qianyu Guo
Tianxiang Sun
Jiawei Sun
...
Yixu Wang
Yan Teng
Xipeng Qiu
Yingchun Wang
Dahua Lin
ALM
62
12
0
12 Nov 2023
In-context Vectors: Making In Context Learning More Effective and
  Controllable Through Latent Space Steering
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
26
95
0
11 Nov 2023
Fake Alignment: Are LLMs Really Aligned Well?
Fake Alignment: Are LLMs Really Aligned Well?
Yixu Wang
Yan Teng
Kexin Huang
Chengqi Lyu
Songyang Zhang
Wenwei Zhang
Xingjun Ma
Yu-Gang Jiang
Yu Qiao
Yingchun Wang
43
18
0
10 Nov 2023
AI-native Interconnect Framework for Integration of Large Language Model
  Technologies in 6G Systems
AI-native Interconnect Framework for Integration of Large Language Model Technologies in 6G Systems
Sasu Tarkoma
Roberto Morabito
Jaakko Sauvola
67
19
0
10 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
151
136
0
09 Nov 2023
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for
  Human-Aligned LLMs
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs
Shuyi Xie
Wenlin Yao
Yong Dai
Shaobo Wang
Donlin Zhou
...
Zhichao Hu
Dong Yu
Zhengyou Zhang
Jing Nie
Yuhong Liu
ELM
ALM
38
4
0
09 Nov 2023
Chain of Images for Intuitively Reasoning
Chain of Images for Intuitively Reasoning
Fanxu Meng
Haotong Yang
Yiding Wang
Muhan Zhang
LRM
46
7
0
09 Nov 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
56
54
0
08 Nov 2023
LooGLE: Can Long-Context Language Models Understand Long Contexts?
LooGLE: Can Long-Context Language Models Understand Long Contexts?
Jiaqi Li
Mengmeng Wang
Zilong Zheng
Muhan Zhang
ELM
RALM
40
120
0
08 Nov 2023
Black-Box Prompt Optimization: Aligning Large Language Models without
  Model Training
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Jiale Cheng
Xiao Liu
Kehan Zheng
Pei Ke
Hongning Wang
Yuxiao Dong
Jie Tang
Minlie Huang
31
81
0
07 Nov 2023
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Geyang Guo
Ranchi Zhao
Tianyi Tang
Wayne Xin Zhao
Ji-Rong Wen
ALM
55
28
0
07 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
138
392
0
07 Nov 2023
Which is better? Exploring Prompting Strategy For LLM-based Metrics
Which is better? Exploring Prompting Strategy For LLM-based Metrics
Joonghoon Kim
Saeran Park
Kiyoon Jeong
Sangmin Lee
S. Han
Jiyoon Lee
Pilsung Kang
25
16
0
07 Nov 2023
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Hanlin Zhang
Benjamin L. Edelman
Danilo Francati
Daniele Venturi
G. Ateniese
Boaz Barak
WaLM
149
55
0
07 Nov 2023
GPT4All: An Ecosystem of Open Source Compressed Language Models
GPT4All: An Ecosystem of Open Source Compressed Language Models
Yuvanesh Anand
Zach Nussbaum
Adam Treat
Aaron Miller
Richard Guo
Ben Schmidt
GPT4All Community
Brandon Duderstadt
Andriy Mulyar
17
20
0
06 Nov 2023
GLaMM: Pixel Grounding Large Multimodal Model
GLaMM: Pixel Grounding Large Multimodal Model
H. Rasheed
Muhammad Maaz
Sahal Shaji Mullappilly
Abdelrahman M. Shaker
Salman Khan
Hisham Cholakkal
Rao M. Anwer
Erix Xing
Ming-Hsuan Yang
Fahad S. Khan
MLLM
VLM
57
217
0
06 Nov 2023
Scalable and Transferable Black-Box Jailbreaks for Language Models via
  Persona Modulation
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
Rusheb Shah
Quentin Feuillade--Montixi
Soroush Pour
Arush Tagade
Stephen Casper
Javier Rando
34
128
0
06 Nov 2023
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Ying Sheng
Shiyi Cao
Dacheng Li
Coleman Hooper
Nicholas Lee
...
Banghua Zhu
Lianmin Zheng
Kurt Keutzer
Joseph E. Gonzalez
Ion Stoica
MoE
28
91
0
06 Nov 2023
DeepInception: Hypnotize Large Language Model to Be Jailbreaker
DeepInception: Hypnotize Large Language Model to Be Jailbreaker
Xuan Li
Zhanke Zhou
Jianing Zhu
Jiangchao Yao
Tongliang Liu
Bo Han
55
167
0
06 Nov 2023
LitSumm: Large language models for literature summarisation of non-coding RNAs
LitSumm: Large language models for literature summarisation of non-coding RNAs
Andrew Green
C. Ribas
Nancy Ontiveros-Palacios
Sam Griffiths-Jones
Anton I. Petrov
Alex Bateman
Blake Sweeney
42
4
0
06 Nov 2023
Post Turing: Mapping the landscape of LLM Evaluation
Post Turing: Mapping the landscape of LLM Evaluation
Alexey Tikhonov
Ivan P. Yamshchikov
ELM
66
4
0
03 Nov 2023
Large Language Models Illuminate a Progressive Pathway to Artificial
  Healthcare Assistant: A Review
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review
Mingze Yuan
Peng Bao
Jiajia Yuan
Yunhao Shen
Zi Chen
...
Jie Zhao
Yang Chen
Li Zhang
Lin Shen
Bin Dong
ELM
LM&MA
65
13
0
03 Nov 2023
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
Jiao Ou
Junda Lu
Che Liu
Yihong Tang
Fuzheng Zhang
Di Zhang
Kun Gai
ALM
LM&MA
55
14
0
03 Nov 2023
FLAP: Fast Language-Audio Pre-training
FLAP: Fast Language-Audio Pre-training
Ching-Feng Yeh
Po-Yao Huang
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
CLIP
VLM
49
8
0
02 Nov 2023
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks
Xinlu Zhang
Yujie Lu
Weizhi Wang
An Yan
Jun Yan
Lianke Qin
Heng Wang
Xifeng Yan
William Y. Wang
Linda R. Petzold
LM&MA
MLLM
ELM
38
80
0
02 Nov 2023
Making Harmful Behaviors Unlearnable for Large Language Models
Making Harmful Behaviors Unlearnable for Large Language Models
Xin Zhou
Yi Lu
Ruotian Ma
Tao Gui
Qi Zhang
Xuanjing Huang
MU
54
9
0
02 Nov 2023
HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning
HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning
Yongjin Yang
Joonkee Kim
Yujin Kim
Namgyu Ho
James Thorne
Se-Young Yun
42
21
0
01 Nov 2023
Instructive Decoding: Instruction-Tuned Large Language Models are
  Self-Refiner from Noisy Instructions
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions
Taehyeon Kim
Joonkee Kim
Gihun Lee
Se-Young Yun
43
13
0
01 Nov 2023
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from
  Human Feedback
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert
Roberto Calandra
ALM
68
32
0
31 Oct 2023
Defining a New NLP Playground
Defining a New NLP Playground
Sha Li
Chi Han
Pengfei Yu
Carl Edwards
Manling Li
...
Yi R. Fung
Charles Yu
Joel R. Tetreault
Eduard H. Hovy
Heng Ji
87
5
0
31 Oct 2023
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark
  for Large Language Models
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
Yuxin Jiang
Yufei Wang
Xingshan Zeng
Wanjun Zhong
Liangyou Li
Fei Mi
Lifeng Shang
Xin Jiang
Qun Liu
Wei Wang
ALM
25
29
0
31 Oct 2023
Integrating Summarization and Retrieval for Enhanced Personalization via
  Large Language Models
Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models
Chris Richardson
Yao Zhang
Kellen Gillespie
Sudipta Kar
Arshdeep Singh
Zeynab Raeesy
Omar Zia Khan
A. Sethy
RALM
37
12
0
30 Oct 2023
Herd: Using multiple, smaller LLMs to match the performances of
  proprietary, large LLMs via an intelligent composer
Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer
S. N. Hari
Matt Thomson
37
0
0
30 Oct 2023
Previous
123...545556...606162
Next