ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 2,961 papers shown
Title
Can Textual Gradient Work in Federated Learning?
Can Textual Gradient Work in Federated Learning?
Minghui Chen
Ruinan Jin
Wenlong Deng
Yuanyuan Chen
Zhi Huang
Han Yu
Xiaoxiao Li
FedML
103
2
0
27 Feb 2025
Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models
Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models
Yi Jing
Zijun Yao
Lingxu Ran
Hongzhu Guo
Xiaozhi Wang
Lei Hou
Juanzi Li
61
0
0
27 Feb 2025
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Shaharukh Khan
Ayush Tarun
Ali Faraz
Palash Kamble
Vivek Dahiya
Praveen Kumar Pokala
Ashish Kulkarni
Chandra Khatri
Abhinav Ravi
Shubham Agarwal
260
0
0
27 Feb 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLM
OffRL
LRM
100
5
0
26 Feb 2025
Voting or Consensus? Decision-Making in Multi-Agent Debate
Voting or Consensus? Decision-Making in Multi-Agent Debate
Lars Benedikt Kaesberg
Jonas Becker
Jan Philip Wahle
Terry Ruas
Bela Gipp
79
3
0
26 Feb 2025
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Liangyou Li
Xin Jiang
Lifeng Shang
Qiang Liu
Wenjie Li
ELM
79
0
0
26 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
MinHyung Lee
Shinbok Lee
Gaeun Seo
98
1
0
26 Feb 2025
Reward Shaping to Mitigate Reward Hacking in RLHF
Reward Shaping to Mitigate Reward Hacking in RLHF
Jiayi Fu
Xuandong Zhao
Chengyuan Yao
Han Wang
Qi Han
Yanghua Xiao
91
7
0
26 Feb 2025
Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems
Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems
Nayoung Choi
Grace Byun
Andrew Chung
Ellie S. Paek
S. Lee
Jinho D. Choi
RALM
86
1
0
26 Feb 2025
Stay Focused: Problem Drift in Multi-Agent Debate
Stay Focused: Problem Drift in Multi-Agent Debate
Jonas Becker
Lars Benedikt Kaesberg
Andreas Stephan
Jan Philip Wahle
Terry Ruas
Bela Gipp
66
1
0
26 Feb 2025
END: Early Noise Dropping for Efficient and Effective Context Denoising
END: Early Noise Dropping for Efficient and Effective Context Denoising
Hongye Jin
Pei Chen
Jingfeng Yang
Zhaoxiang Wang
Meng Jiang
...
Xuzhi Zhang
Zheng Li
Tianyi Liu
Huasheng Li
Bing Yin
235
0
0
26 Feb 2025
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Shuliang Liu
Xinze Li
Zhenghao Liu
Yukun Yan
Cheng Yang
Zheni Zeng
Zhiyuan Liu
Maosong Sun
Ge Yu
RALM
113
3
0
26 Feb 2025
Self-rewarding correction for mathematical reasoning
Self-rewarding correction for mathematical reasoning
Wei Xiong
Hanning Zhang
Chenlu Ye
Lichang Chen
Nan Jiang
Tong Zhang
ReLM
KELM
LRM
80
10
0
26 Feb 2025
Towards Optimal Multi-draft Speculative Decoding
Towards Optimal Multi-draft Speculative Decoding
Zhibo Hu
Tong Zheng
Vignesh Viswanathan
Ziyi Chen
Ryan Rossi
Yihan Wu
Dinesh Manocha
Heng Huang
47
4
0
26 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Yuanhang Zhang
Yinpeng Dong
Hang Su
HILM
KELM
311
0
0
26 Feb 2025
When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
Yijiang River Dong
Tiancheng Hu
Yinhong Liu
Ahmet Üstün
Nigel Collier
93
1
0
26 Feb 2025
Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning
Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning
Yanan Chen
Ali Pesaranghader
Tanmana Sadhu
LRM
64
0
0
26 Feb 2025
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha
Shashwat Goel
Ponnurangam Kumaraguru
Jonas Geiping
Matthias Bethge
Ameya Prabhu
ReLM
ELM
LRM
131
0
0
26 Feb 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Yunjia Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
67
4
0
26 Feb 2025
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles
Kuang Wang
Xianrui Li
Steve Yang
Li Zhou
Feng Jiang
Haoyang Li
56
0
0
26 Feb 2025
ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction
ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction
Jeesu Jung
Chanjun Park
Sangkeun Jung
83
0
0
26 Feb 2025
Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent
Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent
Xiaofeng Wang
Zizhuo Zhang
Jinguang Zheng
Yiming Ai
Rui Wang
45
1
0
25 Feb 2025
MergeIT: From Selection to Merging for Efficient Instruction Tuning
Hongyi Cai
Yuqian Fu
Hongming Fu
Bo Zhao
MoMe
65
0
0
25 Feb 2025
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Jianhao Yan
Yun Luo
Yue Zhang
LLMAG
75
1
0
25 Feb 2025
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Shiping Gao
Fanqi Wan
Jiajian Guo
Xiaojun Quan
Qifan Wang
ALM
60
0
0
25 Feb 2025
Stackelberg Game Preference Optimization for Data-Efficient Alignment of Language Models
Stackelberg Game Preference Optimization for Data-Efficient Alignment of Language Models
Xu Chu
Zhixin Zhang
Tianyu Jia
Yujie Jin
87
0
0
25 Feb 2025
Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources
Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources
Joachim De Baer
A. Seza Doğruöz
T. Demeester
Chris Develder
52
0
0
25 Feb 2025
Monte Carlo Temperature: a robust sampling strategy for LLM's uncertainty quantification methods
Monte Carlo Temperature: a robust sampling strategy for LLM's uncertainty quantification methods
Nicola Cecere
Andrea Bacciu
Ignacio Fernández Tobías
Amin Mantrach
73
0
0
25 Feb 2025
Can Large Language Models Extract Customer Needs as well as Professional Analysts?
Artem Timoshenko
Chengfeng Mao
J. Hauser
ELM
66
0
0
25 Feb 2025
AMPO: Active Multi-Preference Optimization
AMPO: Active Multi-Preference Optimization
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
55
0
0
25 Feb 2025
Streaming Looking Ahead with Token-level Self-reward
Han Zhang
Ruixin Hong
Dong Yu
49
1
0
24 Feb 2025
REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction
REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction
Omar Sharif
Joseph Gatto
Madhusudan Basak
S. Preum
55
0
0
24 Feb 2025
Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring
Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring
Xuansheng Wu
Padmaja Pravin Saraf
Gyeong-Geon Lee
Ehsan Latif
Ninghao Liu
Xiaoming Zhai
65
7
0
24 Feb 2025
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Yepeng Weng
Dianwen Mei
Huishi Qiu
Xujie Chen
Li Liu
Jiang Tian
Zhongchao Shi
65
0
0
24 Feb 2025
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs
Gengyuan Zhang
Mingcong Ding
Tong Liu
Yao Zhang
Volker Tresp
89
1
0
24 Feb 2025
RLTHF: Targeted Human Feedback for LLM Alignment
RLTHF: Targeted Human Feedback for LLM Alignment
Yifei Xu
Tusher Chakraborty
Emre Kıcıman
Bibek Aryal
Eduardo Rodrigues
...
Rafael Padilha
Leonardo Nunes
Shobana Balakrishnan
Songwu Lu
Ranveer Chandra
121
1
0
24 Feb 2025
Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders
Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders
Xuansheng Wu
Jiayi Yuan
Wenlin Yao
Xiaoming Zhai
Ninghao Liu
LLMSV
85
7
0
24 Feb 2025
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan
Zhenyi Lu
Sichen Liu
Xiaoye Qu
Xiaoye Qu
Wei Wei
Yu Cheng
MoE
252
0
0
24 Feb 2025
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
Yupeng Chang
Yi-Ju Chang
Yuan Wu
AI4CE
ALM
100
0
0
24 Feb 2025
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
88
2
0
24 Feb 2025
Is Free Self-Alignment Possible?
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
118
2
0
24 Feb 2025
TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
Zhaoxuan Wu
Zijian Zhou
Arun Verma
Alok Prakash
Daniela Rus
Bryan Kian Hsiang Low
60
0
0
24 Feb 2025
Mitigating Bias in RAG: Controlling the Embedder
Mitigating Bias in RAG: Controlling the Embedder
Taeyoun Kim
Jacob Mitchell Springer
Aditi Raghunathan
Maarten Sap
63
1
0
24 Feb 2025
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Chenghua Huang
Lu Wang
Fangkai Yang
Pu Zhao
Zechao Li
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
57
1
0
24 Feb 2025
Evaluating Social Biases in LLM Reasoning
Evaluating Social Biases in LLM Reasoning
Xuyang Wu
Jinming Nian
Zhiqiang Tao
Yi Fang
LRM
72
0
0
24 Feb 2025
Grounded Persuasive Language Generation for Automated Marketing
Grounded Persuasive Language Generation for Automated Marketing
Jibang Wu
Chenghao Yang
Simon Mahns
Chaoqi Wang
Hao Zhu
Fei Fang
Haifeng Xu
51
1
0
24 Feb 2025
Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following
Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following
Jie Zeng
Qianyu He
Qingyu Ren
Jiaqing Liang
Yanghua Xiao
Weikang Zhou
Zeye Sun
Fei Yu
88
1
0
24 Feb 2025
From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants
From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants
Manisha Mukherjee
Sungchul Kim
Xiang Chen
Dan Luo
Tong Yu
Tung Mai
RALM
52
1
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
73
0
0
24 Feb 2025
Model Lakes
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
67
0
0
24 Feb 2025
Previous
123...101112...585960
Next