Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05685
Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"
50 / 3,059 papers shown
Title
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
86
0
0
17 Feb 2025
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Yao-Ching Yu
Tsun-Han Chiang
Cheng-Wei Tsai
Chien-Ming Huang
Wen-Kwang Tsao
79
6
0
16 Feb 2025
Uncertainty-Aware Step-wise Verification with Generative Reward Models
Zihuiwen Ye
Luckeciano C. Melo
Younesse Kaddar
Phil Blunsom
Shivalika Singh
Yarin Gal
LRM
86
2
0
16 Feb 2025
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu
Jingyang Li
Xingyu Xie
Zhihui Lu
Kim-Chuan Toh
Pan Zhou
77
0
0
16 Feb 2025
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
Hui Wei
Zihao Zhang
Shenghua He
Tian Xia
Shijia Pan
Fei Liu
83
8
0
16 Feb 2025
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
Hongye Cao
Yanming Wang
Sijia Jing
Ziyue Peng
Zhixin Bai
...
Yang Gao
Fanyu Meng
Xi Yang
Chao Deng
Junlan Feng
AAML
63
1
0
16 Feb 2025
Leveraging Uncertainty Estimation for Efficient LLM Routing
Tuo Zhang
Asal Mehradfar
Dimitrios Dimitriadis
Salman Avestimehr
72
1
0
16 Feb 2025
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie
Qingqiu Li
Zhuohao Yu
Yuejie Zhang
Yue Zhang
Linyi Yang
ELM
74
1
0
15 Feb 2025
Preference learning made easy: Everything should be understood through win rate
Lily H. Zhang
Rajesh Ranganath
87
0
0
14 Feb 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou
Yuda Song
Andrea Zanette
ALM
89
0
0
14 Feb 2025
Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models
Xin Zhou
Yiwen Guo
Ruotian Ma
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
110
2
0
13 Feb 2025
GoRA: Gradient-driven Adaptive Low Rank Adaptation
Haonan He
Peng Ye
Yuchen Ren
Yuan Yuan
Lei Chen
AI4TS
AI4CE
318
0
0
13 Feb 2025
Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies
Sunnie S. Y. Kim
J. Vaughan
Q. V. Liao
Tania Lombrozo
Olga Russakovsky
133
5
0
12 Feb 2025
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
Sujeong Lee
Hayoung Lee
Seongsoo Heo
Wonik Choi
HILM
100
0
0
12 Feb 2025
Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning
Zijian He
Reyna Abhyankar
Vikranth Srivatsa
Yiying Zhang
65
1
0
12 Feb 2025
No Need for Explanations: LLMs can implicitly learn from mistakes in-context
Lisa Alazraki
Maximilian Mozes
Jon Ander Campos
Yi Chern Tan
Marek Rei
Max Bartolo
ReLM
LRM
129
0
0
12 Feb 2025
Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations
Kunal Handa
Alex Tamkin
Miles McCain
Saffron Huang
Esin Durmus
...
Kevin K. Troy
Dario Amodei
Jared Kaplan
Jack Clark
Deep Ganguli
MLAU
94
0
0
11 Feb 2025
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Sihwan Park
Doohyuk Jang
Sungyub Kim
Souvik Kundu
Eunho Yang
83
0
0
10 Feb 2025
Automated Consistency Analysis of LLMs
Aditya Patwardhan
Vivek Vaidya
Ashish Kundu
69
1
0
10 Feb 2025
Expect the Unexpected: FailSafe Long Context QA for Finance
Kiran Kamble
M. Russak
Dmytro Mozolevskyi
Muayad Ali
Mateusz Russak
Waseem Alshikh
103
0
0
10 Feb 2025
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
Guoxin Chen
Minpeng Liao
Peiying Yu
Dingmin Wang
Zile Qiao
Chao Yang
Xin Zhao
Kai Fan
68
1
0
10 Feb 2025
AI Alignment at Your Discretion
Maarten Buyl
Hadi Khalaf
C. M. Verdun
Lucas Monteiro Paes
Caio Vieira Machado
Flavio du Pin Calmon
57
0
0
10 Feb 2025
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
Chaoqun Liu
Wenxuan Zhang
Jiahao Ying
Mahani Aljunied
Anh Tuan Luu
Lidong Bing
ELM
67
1
0
10 Feb 2025
Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type
Seokwon Song
Taehyun Lee
Jaewoo Ahn
Jae Hyuk Sung
Gunhee Kim
CoGe
118
0
0
10 Feb 2025
Combining Large Language Models with Static Analyzers for Code Review Generation
Imen Jaoua
Oussama Ben Sghaier
Houari Sahraoui
82
1
0
10 Feb 2025
Unbiased Evaluation of Large Language Models from a Causal Perspective
Meilin Chen
Jian Tian
Liang Ma
Di Xie
Weijie Chen
Jiang Zhu
ALM
ELM
77
0
0
10 Feb 2025
Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention
Zhendong Zhang
66
0
0
09 Feb 2025
Learning to Substitute Words with Model-based Score Ranking
Hongye Liu
Ricardo Henao
73
0
0
09 Feb 2025
MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation
Zhiyong Yang
Keyang Lu
Chao Zhang
Jiaxing Qi
Hanqi Jiang
...
Yifan Xu
Mingzhe Xing
Zhen Xiao
Jieyi Long
Xiangde Liu
63
4
0
09 Feb 2025
SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation
Yixian Shen
Qi Bi
Jia-Hong Huang
Hongyi Zhu
Andy D. Pimentel
Anuj Pathania
69
0
0
08 Feb 2025
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization
Yongcheng Zeng
Xinyu Cui
Xuanfa Jin
Guoqing Liu
Zexu Sun
Quan He
Dong Li
Ning Yang
Haifeng Zhang
Jun Wang
LLMAG
LRM
119
1
0
08 Feb 2025
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding
Sukmin Cho
S. Choi
T. Hwang
Jeongyeon Seo
Soyeong Jeong
Huije Lee
Hoyun Song
Jong C. Park
Youngjin Kwon
58
0
0
08 Feb 2025
DeepThink: Aligning Language Models with Domain-Specific User Intents
Yang Li
Mingxuan Luo
Yeyun Gong
Chen Lin
Jian Jiao
Yi Liu
Kaili Huang
LRM
ALM
ELM
72
0
0
08 Feb 2025
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Berk Atil
Vipul Gupta
Sarkar Snigdha Sarathi Das
R. Passonneau
275
0
0
07 Feb 2025
Self-Supervised Prompt Optimization
Jinyu Xiang
Jiayi Zhang
Zhaoyang Yu
Fengwei Teng
Jinhao Tu
Xinbing Liang
Sirui Hong
Chenglin Wu
Yuyu Luo
OffRL
LRM
80
9
0
07 Feb 2025
M-IFEval: Multilingual Instruction-Following Evaluation
Antoine Dussolle
Andrea Cardeña Díaz
Shota Sato
Peter Devine
ELM
75
0
0
07 Feb 2025
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
Mardhiyah Sanni
Tassallah Abdullahi
Devendra D. Kayande
Emmanuel Ayodele
Naome A. Etori
...
Chibuzor Okocha
L. Ismaila
Folafunmi Omofoye
Boluwatife A. Adewale
Tobi Olatunji
123
1
0
06 Feb 2025
PsyPlay: Personality-Infused Role-Playing Conversational Agents
Tao Yang
Yuhua Zhu
Xiaojun Quan
Cong Liu
Qifan Wang
95
1
0
06 Feb 2025
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
Buyun Liang
Kwan Ho Ryan Chan
D. Thaker
Jinqi Luo
René Vidal
AAML
53
0
0
05 Feb 2025
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning
Zhekai Du
Yinjie Min
Jingjing Li
Ke Lu
Changliang Zou
Liuhua Peng
Tingjin Chu
Mingming Gong
271
1
0
05 Feb 2025
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
Nikhil Bhendawade
Mahyar Najibi
Devang Naik
Irina Belousova
MoE
95
0
0
04 Feb 2025
Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
Chia-Wen Kuo
Sijie Zhu
Fan Chen
Xiaohui Shen
Longyin Wen
VLM
72
1
0
04 Feb 2025
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
Javier Rando
Jie Zhang
Nicholas Carlini
F. Tramèr
AAML
ELM
72
6
0
04 Feb 2025
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
Hongxin Li
Jingfan Chen
Jingran Su
Yuntao Chen
Qing Li
Zhaoxiang Zhang
305
0
0
04 Feb 2025
STAIR: Improving Safety Alignment with Introspective Reasoning
Yuanhang Zhang
Siyuan Zhang
Yao Huang
Zeyu Xia
Zhengwei Fang
Xiao Yang
Ranjie Duan
Dong Yan
Yinpeng Dong
Jun Zhu
LRM
LLMSV
65
5
0
04 Feb 2025
Agentic Bug Reproduction for Effective Automated Program Repair at Google
Runxiang Cheng
Michele Tufano
Jürgen Cito
J. Cambronero
Pat Rondon
Renyao Wei
Aaron Sun
S. Chandra
54
0
0
03 Feb 2025
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
Zongxi Li
Yuante Li
Haoran Xie
S. J. Qin
87
0
0
03 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
58
0
0
03 Feb 2025
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
Jiawei Han
Xuzhi Zhang
Wei Wang
Huan Liu
74
22
0
03 Feb 2025
PARA: Parameter-Efficient Fine-tuning with Prompt Aware Representation Adjustment
Zequan Liu
Yi Zhao
Ming Tan
Wei Zhu
Aaron Xuxiang Tian
93
0
0
03 Feb 2025
Previous
1
2
3
...
14
15
16
...
60
61
62
Next