ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 2,926 papers shown
Title
Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions
Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions
Dong Jing
Nanyi Fei
Zhiwu Lu
56
0
0
24 Mar 2025
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
Chang Gao
Kang Zhao
Jianfei Chen
Liping Jing
52
0
0
24 Mar 2025
SPHERE: An Evaluation Card for Human-AI Systems
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
68
1
0
24 Mar 2025
A Multi-Model Adaptation of Speculative Decoding for Classification
A Multi-Model Adaptation of Speculative Decoding for Classification
Somnath Roy
Padharthi Sreekar
Srivatsa Narasimha
Anubhav Anand
53
0
0
23 Mar 2025
Won: Establishing Best Practices for Korean Financial NLP
Won: Establishing Best Practices for Korean Financial NLP
Guijin Son
Hyunwoo Ko
Haneral Jung
Chami Hwang
54
0
0
23 Mar 2025
STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models
STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models
Xunguang Wang
Wenxuan Wang
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Daoyuan Wu
Shuai Wang
64
1
0
23 Mar 2025
SLIDE: Sliding Localized Information for Document Extraction
SLIDE: Sliding Localized Information for Document Extraction
Divyansh Singh
Manuel Nunez Martinez
Bonnie J. Dorr
Sonja Schmer-Galunder
44
0
0
23 Mar 2025
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities
Weixiang Zhao
Xingyu Sui
Jiahe Guo
Yulin Hu
Yang Deng
Yanyan Zhao
Bing Qin
Wanxiang Che
Tat-Seng Chua
Ting Liu
ELM
LRM
69
5
0
23 Mar 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Sharan Maiya
Yinhong Liu
Ramit Debnath
Anna Korhonen
46
0
0
22 Mar 2025
ChatBench: From Static Benchmarks to Human-AI Evaluation
ChatBench: From Static Benchmarks to Human-AI Evaluation
Serina Chang
Ashton Anderson
Jake M. Hofman
ELM
AI4MH
59
2
0
22 Mar 2025
Building Resource-Constrained Language Agents: A Korean Case Study on Chemical Toxicity Information
Building Resource-Constrained Language Agents: A Korean Case Study on Chemical Toxicity Information
Hojun Cho
Donghu Kim
Soyoung Yang
Chan Lee
Hunjoo Lee
Jaegul Choo
61
1
0
22 Mar 2025
Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback
Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback
Yalan Qin
Xiuying Chen
Rui Pan
Han Zhu
Chen Zhang
...
Chi-Min Chan
Sirui Han
Yike Guo
Yiran Yang
Yaodong Yang
OffRL
82
4
0
22 Mar 2025
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
Albert Sawczyn
Jakub Binkowski
Denis Janiak
Bogdan Gabrys
Tomasz Kajdanowicz
HILM
LRM
66
0
0
21 Mar 2025
Follow-up Question Generation For Enhanced Patient-Provider Conversations
Follow-up Question Generation For Enhanced Patient-Provider Conversations
Joseph Gatto
Parker Seegmiller
Timothy E. Burdick
Inas S. Khayal
Sarah DeLozier
S. Preum
LM&MA
MedIm
66
0
0
21 Mar 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Jian Guan
Jian Wu
Jia-Nan Li
Chuanqi Cheng
Wei Wu
LM&MA
91
0
0
21 Mar 2025
Conversational User-AI Intervention: A Study on Prompt Rewriting for Improved LLM Response Generation
Conversational User-AI Intervention: A Study on Prompt Rewriting for Improved LLM Response Generation
Rupak Sarkar
Bahareh Sarrafzadeh
N. Chandrasekaran
Nagu Rangan
Philip Resnik
Longqi Yang
S. Jauhar
55
1
0
21 Mar 2025
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
Aly M. Kassem
Bernhard Schölkopf
Zhijing Jin
31
0
0
20 Mar 2025
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
Ruihan Yang
Fanghua Ye
Jian Li
Siyu Yuan
Yikai Zhang
Zhaopeng Tu
Xiaolong Li
Deqing Yang
LLMAG
78
4
0
20 Mar 2025
Towards Automatic Continual Learning: A Self-Adaptive Framework for Continual Instruction Tuning
Towards Automatic Continual Learning: A Self-Adaptive Framework for Continual Instruction Tuning
Peiyi Lin
Fukai Zhang
Kai Niu
Hao Fu
CLL
76
0
0
20 Mar 2025
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
Yifan Sun
Han Wang
Dongbai Li
Gang Wang
Huan Zhang
AAML
66
0
0
20 Mar 2025
VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning
VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning
Y. Tan
Chen Liu
Jingyuan Gao
Banghao Wu
Mingchen Li
...
Lingrong Zhang
Huiqun Yu
Guisheng Fan
Liang Hong
Bingxin Zhou
63
1
0
19 Mar 2025
A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology
A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology
Yi Luo
H. Hooshangnejad
Xue Feng
Gaofeng Huang
Xiao Chen
Rui Zhang
Quan Chen
Wil Ngwa
Kai Ding
65
0
0
19 Mar 2025
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang
Munan Ning
Zheyuan Liu
Yanbo Wang
Jiayi Ye
Yue Huang
Shuo Yang
Xiao Chen
Y. Song
Li Yuan
LRM
67
0
0
19 Mar 2025
Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
Shuo Li
Jiajun Sun
Guodong Zheng
Xiaoran Fan
Yujiong Shen
...
Wenming Tan
Tao Ji
Tao Gui
Qi Zhang
Xuanjing Huang
AAML
VLM
95
1
0
19 Mar 2025
Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient's Point of View
Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient's Point of View
Mathilde Aguiar
Pierre Zweigenbaum
Nona Naderi
LM&MA
51
0
0
19 Mar 2025
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
Donggon Jang
Yucheol Cho
Suin Lee
Taehyeon Kim
Dae-Shik Kim
VLM
70
1
0
18 Mar 2025
Where do Large Vision-Language Models Look at when Answering Questions?
Where do Large Vision-Language Models Look at when Answering Questions?
X. Xing
Chia-Wen Kuo
Li Fuxin
Yulei Niu
Fan Chen
Ming Li
Ying Wu
Longyin Wen
Sijie Zhu
LRM
64
0
0
18 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
57
1
0
18 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
91
4
0
18 Mar 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
68
0
0
18 Mar 2025
Synthetic Clarification and Correction Dialogues about Data-Centric Tasks -- A Teacher-Student Approach
Synthetic Clarification and Correction Dialogues about Data-Centric Tasks -- A Teacher-Student Approach
Christian Poelitz
Nick McKenna
54
1
0
18 Mar 2025
Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs
Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs
Jasmin Wachter
Michael Radloff
Maja Smolej
Katharina Kinder-Kurlanda
49
0
0
17 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
88
1
0
17 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Wan Ju Kang
Eunki Kim
Na Min An
Sangryul Kim
Haemin Choi
Ki Hoon Kwak
James Thorne
54
0
0
17 Mar 2025
Can Language Models Follow Multiple Turns of Entangled Instructions?
Can Language Models Follow Multiple Turns of Entangled Instructions?
Chi Han
ELM
LRM
55
1
0
17 Mar 2025
Unveiling Pitfalls: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution
Unveiling Pitfalls: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution
Zhi Chen
Wei Ma
Lingxiao Jiang
LLMAG
58
0
0
16 Mar 2025
A Survey on the Optimization of Large Language Model-based Agents
A Survey on the Optimization of Large Language Model-based Agents
Shangheng Du
Jiabao Zhao
Jinxin Shi
Zhentao Xie
Xin Jiang
Yanhong Bai
Liang He
LLMAG
LM&Ro
LM&MA
319
1
0
16 Mar 2025
From Demonstrations to Rewards: Alignment Without Explicit Human Preferences
From Demonstrations to Rewards: Alignment Without Explicit Human Preferences
Siliang Zeng
Yao Liu
Huzefa Rangwala
George Karypis
Mingyi Hong
Rasool Fakoor
57
2
0
15 Mar 2025
A Survey on Federated Fine-tuning of Large Language Models
A Survey on Federated Fine-tuning of Large Language Models
Yebo Wu
Chunlin Tian
Jingguang Li
He Sun
Kahou Tam
Li Li
Chengzhong Xu
FedML
86
0
0
15 Mar 2025
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Chuhan Zhang
Chaoyang Zhu
Pingcheng Dong
Long Chen
Dong Zhang
ObjD
VLM
254
0
0
14 Mar 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELM
LRM
62
0
0
14 Mar 2025
Cross-Modal Learning for Music-to-Music-Video Description Generation
Zhuoyuan Mao
Mengjie Zhao
Qiyu Wu
Zhi-Wei Zhong
Wei-Hsiang Liao
Hiromi Wakaki
Yuki Mitsufuji
DiffM
VGen
87
0
0
14 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
59
1
0
14 Mar 2025
D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning
D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning
Jia Zhang
Chen-Xi Zhang
Yihong Liu
Yi-Xuan Jin
Xiao-Wen Yang
Bo Zheng
Yi Liu
Lan-Zhe Guo
54
2
0
14 Mar 2025
Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring
Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring
Kezia Oketch
John P. Lalor
Yi Yang
Ahmed Abbasi
ELM
57
1
0
14 Mar 2025
Source-primed Multi-turn Conversation Helps Large Language Models Translate Documents
Hanxu Hu
Jannis Vamvas
Rico Sennrich
55
0
0
13 Mar 2025
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1
Zhaoyi Li
Xiaohan Zhao
Dong-Dong Wu
Jiacheng Cui
Zhiqiang Shen
AAML
VLM
77
1
0
13 Mar 2025
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
So Young Lee
Russell Scheinberg
Amber Shore
Ameeta Agrawal
58
1
0
13 Mar 2025
SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable
Jiaxin Zhang
Zechao Li
Wendi Cui
Kamalika Das
Bradley Malin
Sricharan Kumar
54
0
0
13 Mar 2025
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
Jiajun Li
Yixing Xu
Haiduo Huang
Xuanwu Yin
D. Li
Edith C. -H. Ngai
E. Barsoum
61
0
0
13 Mar 2025
Previous
123...678...575859
Next