ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 2,997 papers shown
Title
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning
Xuyang Wu
Jinming Nian
Ting-Ruen Wei
Zhiqiang Tao
Hsin-Tai Wu
Yi Fang
LRM
74
0
0
21 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
276
2
0
21 Feb 2025
Pub-Guard-LLM: Detecting Fraudulent Biomedical Articles with Reliable Explanations
Pub-Guard-LLM: Detecting Fraudulent Biomedical Articles with Reliable Explanations
Lihu Chen
Shuojie Fu
Gabriel Freedman
Cemre Zor
Guy Martin
James Kinross
Uddhav Vaghela
Ovidiu Serban
Francesca Toni
DeLMO
84
0
0
21 Feb 2025
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Elliot Schumacher
Dhruv Naik
Anitha Kannan
LM&MA
49
0
0
20 Feb 2025
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Wenwen Xie
Gray Gwizdz
Dongji Feng
92
0
0
20 Feb 2025
C2T: A Classifier-Based Tree Construction Method in Speculative Decoding
C2T: A Classifier-Based Tree Construction Method in Speculative Decoding
Feiye Huo
Jianchao Tan
Kai Zhang
Xunliang Cai
Shengli Sun
51
0
0
20 Feb 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
78
26
0
20 Feb 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Zehan Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
53
1
0
20 Feb 2025
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Michael Luo
Xiaoxiang Shi
Colin Cai
Tianjun Zhang
Justin Wong
...
Chi Wang
Yanping Huang
Zhifeng Chen
Joseph E. Gonzalez
Ion Stoica
72
3
0
20 Feb 2025
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge
Ha Trinh
Newman Cheng
Joshua Bradley
Alex Chao
Apurva Mody
Steven Truitt
Dasha Metropolitansky
Robert Osazuwa Ness
Jonathan Larson
RALM
145
364
0
20 Feb 2025
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Joshua Harris
Timothy Laurence
Leo Loman
Fan Grayson
Toby Nonnenmacher
...
Hamish Mohammed
Thomas Finnie
Luke Hounsome
Michael Borowitz
Steven Riley
LM&MA
AI4MH
96
5
0
20 Feb 2025
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models
Seonil Son
Ju-Min Oh
Heegon Jin
Cheolhun Jang
Jeongbeom Jeong
Kuntae Kim
70
0
0
20 Feb 2025
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Faster WIND: Accelerating Iterative Best-of-NNN Distillation for LLM Alignment
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
50
4
0
20 Feb 2025
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Zhili Liu
Yunhao Gou
Kai Chen
Lanqing Hong
Jiahui Gao
...
Yu Zhang
Zhenguo Li
Xin Jiang
Qiang Liu
James T. Kwok
MoE
133
9
0
20 Feb 2025
Optimizing Model Selection for Compound AI Systems
Optimizing Model Selection for Compound AI Systems
Lingjiao Chen
Jared Quincy Davis
Boris Hanin
Peter Bailis
Matei A. Zaharia
James Zou
Ion Stoica
86
1
0
20 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Zehan Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Yansen Wang
56
0
0
19 Feb 2025
Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models
Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models
Srishti Yadav
Zhi Zhang
Daniel Hershcovich
Ekaterina Shutova
64
2
0
18 Feb 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
ALM
MQ
111
0
0
18 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective
Computational Safety for Generative AI: A Signal Processing Perspective
Pin-Yu Chen
93
1
0
18 Feb 2025
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
Weikang Qiu
Zheng Huang
Haoyu Hu
Aosong Feng
Yujun Yan
Rex Ying
70
0
0
18 Feb 2025
Savaal: Scalable Concept-Driven Question Generation to Enhance Human Learning
Savaal: Scalable Concept-Driven Question Generation to Enhance Human Learning
Kimia Noorbakhsh
Joseph Chandler
Pantea Karimi
M. Alizadeh
H. Balakrishnan
LRM
61
1
0
18 Feb 2025
Multi-Attribute Steering of Language Models via Targeted Intervention
Multi-Attribute Steering of Language Models via Targeted Intervention
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
LLMSV
112
0
0
18 Feb 2025
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
Junkai Chen
Zhijie Deng
Kening Zheng
Yibo Yan
Shuliang Liu
PeiJun Wu
Peijie Jiang
Qingbin Liu
Xuming Hu
MU
73
5
0
18 Feb 2025
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Shu Yang
Shenzhe Zhu
Zeyu Wu
Keyu Wang
Junchi Yao
Junchao Wu
Lijie Hu
Mengdi Li
Derek F. Wong
Di Wang
18
0
0
18 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
135
4
0
18 Feb 2025
RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars
RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars
Yuncheng Hua
Zhuang Li
Zhuang Li
Hao Xue
Flora D. Salim
Gholamreza Haffari
ALM
134
0
0
17 Feb 2025
System Message Generation for User Preferences using Open-Source Models
System Message Generation for User Preferences using Open-Source Models
Minbyul Jeong
Jungho Cho
Minsoo Khang
Dawoon Jung
Teakgyu Hong
50
0
0
17 Feb 2025
InsBank: Evolving Instruction Subset for Ongoing Alignment
InsBank: Evolving Instruction Subset for Ongoing Alignment
Jiayi Shi
Yiwei Li
Shaoxiong Feng
Peiwen Yuan
Xiaobei Wang
...
Chuyi Tan
Boyuan Pan
Huan Ren
Yao Hu
Kan Li
ALM
97
0
0
17 Feb 2025
Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models
Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models
Yue Xu
Chengyan Fu
Li Xiong
Sibei Yang
Wenjie Wang
70
0
0
17 Feb 2025
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis
Wenbo Zhang
Hengrui Cai
Wenyu Chen
89
0
0
17 Feb 2025
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
Hugh Mee Wong
Rick Nouwen
Albert Gatt
75
0
0
17 Feb 2025
DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing
DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing
Yi Wang
Fenghua Weng
Shangshang Yang
Zhan Qin
Minlie Huang
Wenjie Wang
KELM
AAML
58
0
0
17 Feb 2025
AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification
AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification
Jue Chen
Tianchu Yao
Chao Qu
Bin Li
Minghao Yang
...
Haozhe Wang
Xihe Qiu
Wei Chu
Yinghui Xu
Yuan Qi
OffRL
LRM
70
2
0
17 Feb 2025
Evaluating Step-by-step Reasoning Traces: A Survey
Evaluating Step-by-step Reasoning Traces: A Survey
Jinu Lee
Julia Hockenmaier
LRM
ELM
23
0
0
17 Feb 2025
Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning
Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning
Tianyi Wu
Jingwei Ni
Bryan Hooi
Jiaheng Zhang
Elliott Ash
See-Kiong Ng
Mrinmaya Sachan
Markus Leippold
66
0
0
17 Feb 2025
LeDex: Training LLMs to Better Self-Debug and Explain Code
LeDex: Training LLMs to Better Self-Debug and Explain Code
Nan Jiang
Xiaopeng Li
Shiqi Wang
Qiang Zhou
Soneya Binta Hossain
Baishakhi Ray
Varun Kumar
Xiaofei Ma
Anoop Deoras
LRM
102
13
0
17 Feb 2025
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection
Jinhe Bi
Yifan Wang
Danqi Yan
Xun Xiao
Artur Hecker
Volker Tresp
Yunpu Ma
VLM
75
4
0
17 Feb 2025
GiFT: Gibbs Fine-Tuning for Code Generation
GiFT: Gibbs Fine-Tuning for Code Generation
Haochen Li
Wanjin Feng
Xin Zhou
Zhiqi Shen
SyDa
84
1
0
17 Feb 2025
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang
Jiajia Li
Lu Yang
Zhiqiang Zhang
Jinghao Tian
Zehan Li
Lefei Zhang
Peijie Wang
58
0
0
17 Feb 2025
From Selection to Generation: A Survey of LLM-based Active Learning
From Selection to Generation: A Survey of LLM-based Active Learning
Yu Xia
Subhojyoti Mukherjee
Zhouhang Xie
Junda Wu
Xintong Li
...
Namyong Park
T. Nguyen
Jiebo Luo
Ryan Rossi
Julian McAuley
62
0
0
17 Feb 2025
A Critical Look At Tokenwise Reward-Guided Text Generation
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
86
0
0
17 Feb 2025
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
Fengqing Jiang
Zhangchen Xu
Yuetai Li
Luyao Niu
Zhen Xiang
Yue Liu
Bill Yuchen Lin
Radha Poovendran
KELM
ELM
LRM
91
21
0
17 Feb 2025
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
Hui Wei
Zihao Zhang
Shenghua He
Tian Xia
Shijia Pan
Fei Liu
76
7
0
16 Feb 2025
Leveraging Uncertainty Estimation for Efficient LLM Routing
Leveraging Uncertainty Estimation for Efficient LLM Routing
Tuo Zhang
Asal Mehradfar
Dimitrios Dimitriadis
Salman Avestimehr
72
1
0
16 Feb 2025
Uncertainty-Aware Step-wise Verification with Generative Reward Models
Uncertainty-Aware Step-wise Verification with Generative Reward Models
Zihuiwen Ye
Luckeciano C. Melo
Younesse Kaddar
Phil Blunsom
Shivalika Singh
Yarin Gal
LRM
78
2
0
16 Feb 2025
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Yao-Ching Yu
Tsun-Han Chiang
Cheng-Wei Tsai
Chien-Ming Huang
Wen-Kwang Tsao
79
6
0
16 Feb 2025
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
Hongye Cao
Yanming Wang
Sijia Jing
Ziyue Peng
Zhixin Bai
...
Yang Gao
Fanyu Meng
Xi Yang
Chao Deng
Junlan Feng
AAML
57
1
0
16 Feb 2025
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu
Jingyang Li
Xingyu Xie
Zhihui Lu
Kim-Chuan Toh
Pan Zhou
77
0
0
16 Feb 2025
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie
Qingqiu Li
Zhuohao Yu
Yuejie Zhang
Yue Zhang
Linyi Yang
ELM
74
1
0
15 Feb 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou
Yuda Song
Andrea Zanette
ALM
89
0
0
14 Feb 2025
Previous
123...121314...585960
Next