ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 2,961 papers shown
Title
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang
Quan Cui
Bingchen Zhao
Cheng Yang
MLLM
SyDa
59
1
0
11 Mar 2025
Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation
Junhao Zhang
Richong Zhang
Fanshuang Kong
Ziyang Miao
Yanhan Ye
Yaowei Zheng
SyDa
46
0
0
10 Mar 2025
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models
Kefan Song
Jin Yao
Runnan Jiang
Rohan Chandra
Shangtong Zhang
ALM
46
0
0
10 Mar 2025
CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
Haicheng Liao
Hanlin Kong
Bonan Wang
Chengyue Wang
Wang Ye
Zhengbing He
Chengzhong Xu
Zehan Li
71
4
0
10 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLM
ObjD
57
0
0
10 Mar 2025
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
Jiacheng Ruan
Wenzhen Yuan
Xian Gao
Ye Guo
Daoxin Zhang
Zhe Xu
Yao Hu
Ting Liu
Yuzhuo Fu
LRM
VLM
77
4
0
10 Mar 2025
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following
Zhiyu Li
Kehai Chen
Yunfei Long
X. Bai
Yaoyin Zhang
Xuchen Wei
J. Li
Min Zhang
ELM
69
0
0
10 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRL
LRM
75
3
0
10 Mar 2025
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Jongwoo Ko
Tianyi Chen
Sungnyun Kim
Tianyu Ding
Luming Liang
Ilya Zharkov
Se-Young Yun
VLM
252
0
0
10 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
74
0
0
09 Mar 2025
Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving
Enming Zhang
Peizhe Gong
Xingyuan Dai
Yisheng Lv
Qinghai Miao
MLLM
ELM
70
2
0
09 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
89
0
0
08 Mar 2025
LimTopic: LLM-based Topic Modeling and Text Summarization for Analyzing Scientific Articles limitations
Ibrahim Al Azhar
Venkata Devesh Reddy
Hamed Alhoori
A. Akella
55
4
0
08 Mar 2025
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation
Yinuo Liu
Zenghui Yuan
Guiyao Tie
Jiawen Shi
Lichao Sun
Lichao Sun
Neil Zhenqiang Gong
56
1
0
08 Mar 2025
Language Model Personalization via Reward Factorization
Idan Shenfeld
Felix Faltings
Pulkit Agrawal
Aldo Pacchiano
55
1
0
08 Mar 2025
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
Xudong Lu
Haohao Gao
Renshou Wu
Shuai Ren
Xiaoxin Chen
Hongsheng Li
Fangyuan Li
ELM
61
0
0
08 Mar 2025
GRP: Goal-Reversed Prompting for Zero-Shot Evaluation with LLMs
Mingyang Song
Mao Zheng
Xuan Luo
LRM
65
0
0
08 Mar 2025
Is Your Video Language Model a Reliable Judge?
M. Liu
Wensheng Zhang
72
2
0
07 Mar 2025
Extracting and Emulsifying Cultural Explanation to Improve Multilingual Capability of LLMs
Hamin Koo
Jaehyung Kim
53
0
0
07 Mar 2025
SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding
Kaiyu Huang
Yu Wang
Zhubo Shi
Han Zou
Minchen Yu
Qingjiang Shi
LRM
54
2
0
07 Mar 2025
Learning and generalization of robotic dual-arm manipulation of boxes from demonstrations via Gaussian Mixture Models (GMMs)
Qian Ying Lee
Suhas Raghavendra Kulkarni
Kenzhi Iskandar Wong
Lin Yang
Bernardo Noronha
Yongjun Wee
Tzu-Yi Hung
Domenico Campolo
53
0
0
07 Mar 2025
RocketEval: Efficient Automated LLM Evaluation via Grading Checklist
Tianjun Wei
Wei Wen
Ruizhi Qiao
Xing Sun
Jianghong Ma
ALM
ELM
57
1
0
07 Mar 2025
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information
Feng Jiang
Zhiyu Lin
Fan Bu
Yuhao Du
Benyou Wang
Haoyang Li
AuLLM
ELM
106
0
0
07 Mar 2025
No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding
Michael Krumdick
Charles Lovering
Varshini Reddy
Seth Ebner
Chris Tanner
ALM
ELM
69
2
0
07 Mar 2025
Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets
Preetam Prabhu Srikar Dammu
Himanshu Naidu
Chirag Shah
47
0
0
06 Mar 2025
LLMs Can Generate a Better Answer by Aggregating Their Own Responses
LLMs Can Generate a Better Answer by Aggregating Their Own Responses
Zichong Li
Xinyu Feng
Yuheng Cai
Zixuan Zhang
Tianyi Liu
Chen Liang
Weizhu Chen
Haoyu Wang
Tiejun Zhao
LRM
60
1
0
06 Mar 2025
Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting
Jiyue Jiang
Pengan Chen
Jinqiao Wang
Dongchen He
Ziqin Wei
...
Yimin Fan
Xiangyu Shi
Jimeng Sun
Chuan Wu
Yuan Li
LM&MA
60
1
0
06 Mar 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
Sreyan Ghosh
Zhifeng Kong
Sonal Kumar
S. Sakshi
Jaehyeon Kim
Ming-Yu Liu
Rafael Valle
Dinesh Manocha
Bryan Catanzaro
MLLM
AuLLM
LRM
64
9
0
06 Mar 2025
Shaping Shared Languages: Human and Large Language Models' Inductive Biases in Emergent Communication
Tom Kouwenhoven
Max Peeperkorn
R. D. Kleijn
Tessa Verhoef
69
0
0
06 Mar 2025
The Challenge of Identifying the Origin of Black-Box Large Language Models
Ziqing Yang
Yixin Wu
Yun Shen
Wei Dai
Michael Backes
Yang Zhang
AAML
47
0
0
06 Mar 2025
Adding Alignment Control to Language Models
Wenhong Zhu
Weinan Zhang
Rui Wang
65
0
0
06 Mar 2025
Topology-Aware Conformal Prediction for Stream Networks
Jifan Zhang
Fangxin Wang
Philip S. Yu
Kaize Ding
Shixiang Zhu
AI4TS
43
0
0
06 Mar 2025
Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment
Wen Yang
Junhong Wu
Chen Wang
Chengqing Zong
J.N. Zhang
82
1
0
06 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
57
0
0
06 Mar 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Shri Kiran Srinivasan
Mohammed Irfan Kurpath
Sahal Shaji Mullappilly
Jean Lahoud
Fahad A Khan
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
AuLLM
207
0
0
06 Mar 2025
How Do Hackathons Foster Creativity? Towards AI Collaborative Evaluation of Creativity at Scale
Jeanette Falk
Yiyi Chen
Janet Rafner
Mike Zhang
Johannes Bjerva
Alexander Nolte
68
1
0
06 Mar 2025
SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection
Yi-Fan Lu
Xian-Ling Mao
Tian Lan
Tong Zhang
Yu-Shi Zhu
Heyan Huang
57
0
0
05 Mar 2025
See What You Are Told: Visual Attention Sink in Large Multimodal Models
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
115
5
0
05 Mar 2025
Process-based Self-Rewarding Language Models
Shimao Zhang
Xiao Liu
Xin Zhang
Junxiao Liu
Zheheng Luo
Shujian Huang
Yeyun Gong
ReLM
SyDa
LRM
97
3
0
05 Mar 2025
HeTGB: A Comprehensive Benchmark for Heterophilic Text-Attributed Graphs
Shujie Li
Yuxia Wu
Chuan Shi
Yuan Fang
49
0
0
05 Mar 2025
CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
Yuqi Zhou
Shuai Wang
Sunhao Dai
Qinglin Jia
Zhaocheng Du
Zhenhua Dong
Jun Xu
LM&Ro
80
0
0
05 Mar 2025
AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks
Javier Yong
Haokai Ma
Yunshan Ma
Anis Yusof
Zhenkai Liang
E. Chang
62
0
0
05 Mar 2025
Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents
Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents
Jingying Zeng
Hui Liu
Zhenwei Dai
Xianfeng Tang
Chen Luo
Samarth Varshney
Zhen Li
Qi He
HILM
69
1
0
05 Mar 2025
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
Peiding Wang
Lihe Zhang
Fang Liu
Lin Shi
Minxiao Li
Bo Shen
An Fu
ELM
LRM
214
1
0
05 Mar 2025
Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation
Hiroshi Takahashi
Tomoharu Iwata
Atsutoshi Kumagai
Yuuki Yamanaka
Tomoya Yamashita
DiffM
77
0
0
05 Mar 2025
Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution
Jizhao Zhu
Akang Shi
Zhiyu Li
Long Bai
Xiaolong Jin
Jiafeng Guo
Xueqi Cheng
63
0
0
05 Mar 2025
Towards Effective and Efficient Context-aware Nucleus Detection in Histopathology Whole Slide Images
Zhongyi Shui
Ruizhe Guo
Honglin Li
Yuxuan Sun
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Pingyi Chen
Yanzhou Su
Lin Yang
58
0
0
04 Mar 2025
AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
Songming Zhang
Xue Zhang
Tong Zhang
Bojie Hu
Yufeng Chen
Jinan Xu
60
1
0
04 Mar 2025
Reflection on Data Storytelling Tools in the Generative AI Era from the Human-AI Collaboration Perspective
Haotian Li
Yuanbo Wang
Huamin Qu
41
0
0
04 Mar 2025
Adversarial Tokenization
Renato Lui Geh
Zilei Shao
Guy Van den Broeck
SILM
AAML
92
0
0
04 Mar 2025
Previous
123...8910...585960
Next