Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05685
Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"
50 / 2,926 papers shown
Title
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
Reya Vir
Shreya Shankar
Harrison Chase
Will Fu-Hinthorn
Aditya G. Parameswaran
AI4TS
39
0
0
20 Apr 2025
Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation
Tuhina Tripathi
Manya Wadhwa
Greg Durrett
S. Niekum
39
0
0
20 Apr 2025
Learning from Reasoning Failures via Synthetic Data Generation
Gabriela Ben-Melech Stan
Estelle Aflalo
Avinash Madasu
Vasudev Lal
Phillip Howard
SyDa
LRM
51
0
0
20 Apr 2025
A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models
Hongming Tan
Shaoxiong Zhan
Fengwei Jia
Hai-Tao Zheng
Wai Kin Victor Chan
31
0
0
20 Apr 2025
Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management
Hang Zhang
Jiuchen Shi
Yixiao Wang
Quan Chen
Yizhou Shan
Minyi Guo
41
0
0
19 Apr 2025
Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy
Regan Bolton
Mohammadreza Sheikhfathollahi
Simon Parkinson
Dan Basher
Howard Parkinson
38
0
0
18 Apr 2025
Secure Multifaceted-RAG for Enterprise: Hybrid Knowledge Retrieval with Security Filtering
Grace Byun
S. Lee
Nayoung Choi
Jinho D. Choi
40
0
0
18 Apr 2025
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
Yicheng Chen
Yining Li
Kai Hu
Zerun Ma
Haochen Ye
Kai Chen
36
0
0
18 Apr 2025
D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Model
Grace Byun
Jinho D. Choi
EGVM
51
0
0
18 Apr 2025
Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results
Andrea Santilli
Adam Goliñski
Michael Kirchhof
Federico Danieli
Arno Blaas
Miao Xiong
Luca Zappella
Sinead Williamson
25
0
0
18 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
41
0
0
18 Apr 2025
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
Jaime Raldua Veuthey
Zainab Ali Majid
Suhas Hariharan
Jacob Haimes
ELM
33
0
0
18 Apr 2025
CodeVisionary: An Agent-based Framework for Evaluating Large Language Models in Code Generation
Xinchen Wang
Pengfei Gao
Chao Peng
Ruida Hu
Cuiyun Gao
ELM
36
0
0
18 Apr 2025
Benchmarking LLM-based Relevance Judgment Methods
Negar Arabzadeh
Charles L. A. Clarke
40
0
0
17 Apr 2025
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Xiaotian Zhang
Ruizhe Chen
Yang Feng
Zuozhu Liu
50
0
0
17 Apr 2025
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer
Huaizhi Qu
Inyoung Choi
Zhen Tan
Song Wang
Sukwon Yun
Qi Long
Faizan Siddiqui
Kwonjoon Lee
Tianlong Chen
50
0
0
17 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
43
0
0
16 Apr 2025
Could Thinking Multilingually Empower LLM Reasoning?
Changjiang Gao
Xu Huang
Wenhao Zhu
Shujian Huang
Lei Li
Fei Yuan
LRM
37
2
0
16 Apr 2025
LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA
Xanh Ho
Jiahao Huang
Florian Boudin
Akiko Aizawa
ELM
40
0
0
16 Apr 2025
A Dual-Space Framework for General Knowledge Distillation of Large Language Models
Xuzhi Zhang
Songming Zhang
Yunlong Liang
Fandong Meng
Yufeng Chen
Jinan Xu
Jie Zhou
38
0
0
15 Apr 2025
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Divyansh Garg
Shaun VanWeelden
Diego Caples
Andis Draguns
Nikil Ravi
...
Youngchul Joo
Jindong Gu
Charles London
Christian Schroeder de Witt
S. Motwani
54
1
0
15 Apr 2025
Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails
William Hackett
Lewis Birch
Stefan Trawicki
N. Suri
Peter Garraghan
40
2
0
15 Apr 2025
Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
V. Veselovsky
Berke Argin
Benedikt Stroebl
Chris Wendler
Robert West
James Evans
Thomas L. Griffiths
Arvind Narayanan
60
0
0
14 Apr 2025
SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging
Tan-Hanh Pham
Chris Ngo
Trong-Duong Bui
Minh Luu Quang
Tan-Huong Pham
Truong-Son Hy
36
1
0
14 Apr 2025
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
Wenyuan Zhang
Shuaiyi Nie
Xinghua Zhang
Zefeng Zhang
Tingwen Liu
ELM
LRM
54
2
0
14 Apr 2025
CHARM: Calibrating Reward Models With Chatbot Arena Scores
Xiao Zhu
Chenmien Tan
Pinzhen Chen
Rico Sennrich
Yanlin Zhang
Hanxu Hu
ALM
41
1
0
14 Apr 2025
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
Kristina Nikolić
Luze Sun
Jie Zhang
F. Tramèr
36
0
0
14 Apr 2025
Enhancing LLM-based Recommendation through Semantic-Aligned Collaborative Knowledge
Zihan Wang
Jinghao Lin
Xiaocui Yang
Yongkang Liu
Shi Feng
Daling Wang
Wenjie Qu
28
0
0
14 Apr 2025
Reasoning Court: Combining Reasoning, Action, and Judgment for Multi-Hop Reasoning
Jingtian Wu
Claire Cardie
LRM
34
0
0
14 Apr 2025
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
Aryan Shrivastava
Paula Akemi Aoyagui
38
0
0
14 Apr 2025
LLM-driven Constrained Copy Generation through Iterative Refinement
Varun Vasudevan
Faezeh Akhavizadegan
Abhinav Prakash
Yokila Arora
Jason H. D. Cho
Tanya Mendiratta
Sushant Kumar
Kannan Achan
37
0
0
14 Apr 2025
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
Shengao Wang
Arjun Chandra
Aoming Liu
Venkatesh Saligrama
Boqing Gong
MLLM
VLM
49
0
0
13 Apr 2025
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model
Zongxian Yang
Jiayu Qian
Z. Huang
Kay Chen Tan
LM&MA
LRM
38
0
0
13 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
51
0
0
12 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
45
5
0
12 Apr 2025
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Prabhat Pandey
Rupak Vignesh Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
236
0
0
12 Apr 2025
SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders
Ashmi Banerjee
Adithi Satish
Fitri Nur Aisyah
Wolfgang Wörndl
Yashar Deldjoo
AI4TS
37
0
0
12 Apr 2025
Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices
Shengyuan Ye
Bei Ouyang
Liekang Zeng
Tianyi Qian
Xiaowen Chu
Jian Tang
Xu Chen
37
1
0
11 Apr 2025
SAEs
Can
\textit{Can}
Can
Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed
Jacopo Bonato
Mona Diab
Virginia Smith
MU
66
2
0
11 Apr 2025
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
Qi Zhi Lim
C. Lee
K. Lim
Kalaiarasi Sonai Muthu Anbananthen
36
0
0
11 Apr 2025
Evaluation and Incident Prevention in an Enterprise AI Assistant
Akash Maharaj
David Arbour
Daniel Lee
Uttaran Bhattacharya
Anup B. Rao
Austin Zane
Avi Feller
Kun Qian
Yunyao Li
32
0
0
11 Apr 2025
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lù
Amirhossein Kazemnejad
Nicholas Meade
Arkil Patel
Dongchan Shin
Alejandra Zambrano
Karolina Stañczak
Peter Shaw
Christopher Pal
Siva Reddy
LLMAG
50
1
0
11 Apr 2025
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
Jiaming Xu
Jiayi Pan
Yongkang Zhou
Siming Chen
Jiajian Li
Yaoxiu Lian
Junyi Wu
Guohao Dai
LRM
40
0
0
11 Apr 2025
Fast-Slow-Thinking: Complex Task Solving with Large Language Models
Yiliu Sun
Yanfang Zhang
Zicheng Zhao
Sheng Wan
Dacheng Tao
Chen Gong
LRM
43
0
0
11 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
54
4
0
11 Apr 2025
Large Language Models Could Be Rote Learners
Yuyang Xu
Renjun Hu
Haochao Ying
Jian Wu
Xing Shi
Wei Lin
ELM
214
0
0
11 Apr 2025
Large Language Models as Span Annotators
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
47
0
0
11 Apr 2025
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
Amirhossein Abaskohi
A. Ramesh
Shailesh Nanisetty
Chirag Goel
David Vazquez
Christopher Pal
Spandana Gella
Giuseppe Carenini
I. Laradji
44
0
0
10 Apr 2025
Synthesizing High-Quality Programming Tasks with LLM-based Expert and Student Agents
Manh Hung Nguyen
Victor-Alexandru Pădurean
Alkis Gotovos
Sebastian Tschiatschek
Adish Singla
29
0
0
10 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
54
0
0
10 Apr 2025
Previous
1
2
3
4
5
...
57
58
59
Next