Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05685
Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"
50 / 2,926 papers shown
Title
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Dahua Lin
Jiaqi Wang
OffRL
60
2
0
10 Apr 2025
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Riccardo Cantini
A. Orsino
Massimo Ruggiero
Domenico Talia
AAML
ELM
50
1
0
10 Apr 2025
NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark
Vladislav Mikhailov
Tita Ranveig Enstad
David Samuel
Hans Christian Farsethås
Andrey Kutuzov
Erik Velldal
Lilja Øvrelid
ELM
45
0
0
10 Apr 2025
Synthesizing High-Quality Programming Tasks with LLM-based Expert and Student Agents
Manh Hung Nguyen
Victor-Alexandru Pădurean
Alkis Gotovos
Sebastian Tschiatschek
Adish Singla
26
0
0
10 Apr 2025
Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue Generation
Bo Zhang
Hui Ma
Dailin Li
Jian Ding
Jian Wang
Bo Xu
Hongfei Lin
KELM
46
0
0
10 Apr 2025
Enhanced Question-Answering for Skill-based learning using Knowledge-based AI and Generative AI
Rahul K. Dass
Rochan H. Madhusudhana
Erin C. Deye
Shashank Verma
Timothy A. Bydlon
Grace Brazil
Ashok K. Goel
36
1
0
10 Apr 2025
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
Amirhossein Abaskohi
A. Ramesh
Shailesh Nanisetty
Chirag Goel
David Vazquez
Christopher Pal
Spandana Gella
Giuseppe Carenini
I. Laradji
41
0
0
10 Apr 2025
A System for Comprehensive Assessment of RAG Frameworks
Mattia Rengo
Senad Beadini
Domenico Alfano
Roberto Abbruzzese
48
1
0
10 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
51
0
0
10 Apr 2025
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
Sher Badshah
Ali Emami
Hassan Sajjad
LLMAG
ELM
45
0
0
10 Apr 2025
Beyond Reproducibility: Advancing Zero-shot LLM Reranking Efficiency with Setwise Insertion
Jakub Podolak
Leon Peric
Mina Janicijevic
Roxana Petcu
28
0
0
09 Apr 2025
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models
Yashar Deldjoo
Nikhil Mehta
M. Sathiamoorthy
Shuai Zhang
Pablo Castells
Julian McAuley
EGVM
ELM
72
1
0
09 Apr 2025
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
Jiacheng Liu
Taylor Blanton
Yanai Elazar
Sewon Min
YenSung Chen
...
Sophie Lebrecht
Yejin Choi
Hannaneh Hajishirzi
Ali Farhadi
Jesse Dodge
44
1
0
09 Apr 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
Xue Yang
Longyue Wang
Zhenran Xu
Yansen Wang
Yaowei Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
EGVM
DiffM
72
0
0
09 Apr 2025
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models
Zhouhang Xie
Junda Wu
Yiran Shen
Yu Xia
Xintong Li
...
Sachin Kumar
Bodhisattwa Prasad Majumder
Jingbo Shang
Prithviraj Ammanabrolu
Julian McAuley
45
1
0
09 Apr 2025
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
Boyuan Zheng
Michael Y. Fatemi
Xiaolong Jin
Ziyi Wang
Apurva Gandhi
...
Yu Gu
Jayanth Srinivasa
Gaowen Liu
Graham Neubig
Yu Su
CLL
45
1
0
09 Apr 2025
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators
Xitao Li
Haoran Wang
Jiang Wu
Ting Liu
AAML
31
0
0
08 Apr 2025
Can LLMs Simulate Personas with Reversed Performance? A Benchmark for Counterfactual Instruction Following
Sai Adith Senthil Kumar
Hao Yan
Saipavan Perepa
Murong Yue
Ziyu Yao
62
0
0
08 Apr 2025
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
Shuzhang Zhong
Yizhou Sun
Ling Liang
Runsheng Wang
R. Huang
Meng Li
MoE
61
1
0
08 Apr 2025
Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning
Yuehan Qin
Shawn Li
Yi Nian
Xinyan Velocity Yu
Yue Zhao
Xuezhe Ma
HILM
LRM
51
0
0
08 Apr 2025
Knowledge-Instruct: Effective Continual Pre-training from Limited Data using Instructions
O. Ovadia
Meni Brief
Rachel Lemberg
Eitam Sheetrit
CLL
KELM
54
0
0
08 Apr 2025
Information-Theoretic Reward Decomposition for Generalizable RLHF
Liyuan Mao
Haoran Xu
Amy Zhang
Weinan Zhang
Chenjia Bai
40
0
0
08 Apr 2025
Single-Agent vs. Multi-Agent LLM Strategies for Automated Student Reflection Assessment
Gen Li
Li Chen
Cheng Tang
Valdemar Švábenský
Daisuke Deguchi
Takayoshi Yamashita
Atsushi Shimada
LLMAG
62
0
0
08 Apr 2025
CARE: Aligning Language Models for Regional Cultural Awareness
Geyang Guo
Tarek Naous
Hiromi Wakaki
Yukiko Nishimura
Yuki Mitsufuji
Alan Ritter
Wei Xu
62
1
0
07 Apr 2025
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Anna Goldie
Azalia Mirhoseini
Hao Zhou
Irene Cai
Christopher D. Manning
SyDa
OffRL
ReLM
LRM
117
3
0
07 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
33
0
0
07 Apr 2025
R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation
Martin Weyssow
Chengran Yang
Junkai Chen
Yikun Li
Huihui Huang
...
Han Wei Ang
Frank Liauw
Eng Lieh Ouh
Lwin Khin Shar
David Lo
LRM
35
0
0
07 Apr 2025
A Llama walks into the 'Bar': Efficient Supervised Fine-Tuning for Legal Reasoning in the Multi-state Bar Exam
Rean Fernandes
André Biedenkapp
Frank Hutter
Noor H. Awad
ALM
ELM
LRM
45
0
0
07 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
54
9
0
07 Apr 2025
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Anqi Zhang
Yulin Chen
Jane Pan
Chen Zhao
Aurojit Panda
Jinyang Li
He He
ReLM
LRM
55
6
0
07 Apr 2025
CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models
Kavana Venkatesh
Connor Dunlop
Pinar Yanardag
DiffM
40
0
0
07 Apr 2025
EduPlanner: LLM-Based Multi-Agent Systems for Customized and Intelligent Instructional Design
Xinsong Zhang
Chao Zhang
Jianwen Sun
Jun Xiao
Yi Yang
Yawei Luo
LLMAG
AI4Ed
60
0
0
07 Apr 2025
NoveltyBench: Evaluating Language Models for Humanlike Diversity
Yiming Zhang
Harshita Diddee
Susan Holm
Hanchen Liu
Xinyue Liu
Vinay Samuel
Barry Wang
Daphne Ippolito
37
1
0
07 Apr 2025
DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation
Xinglin Lyu
Wei Tang
Yongqian Li
X. Zhao
Ming Zhu
...
Yaojie Lu
Min Zhang
Daimeng Wei
Hao Yang
Min Zhang
81
0
0
07 Apr 2025
Video-Bench: Human-Aligned Video Generation Benchmark
Hui Han
Siyuan Li
Jiaqi Chen
Yiwen Yuan
Yuling Wu
...
You Li
Jingyang Zhang
Chi Zhang
Li Li
Yongxin Ni
EGVM
VGen
73
0
0
07 Apr 2025
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
Minki Kang
Jongwon Jeong
Jaewoong Cho
ALM
LRM
57
2
0
07 Apr 2025
ArxivBench: Can LLMs Assist Researchers in Conducting Research?
Ning Li
Jingran Zhang
Justin Cui
29
0
0
06 Apr 2025
SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities
Noga Ben Yoash
Meni Brief
O. Ovadia
Gil Shenderovitz
Moshik Mishaeli
Rachel Lemberg
Eitam Sheetrit
ELM
AIFin
35
0
0
06 Apr 2025
Advancing Egocentric Video Question Answering with Multimodal Large Language Models
Alkesh Patel
Vibhav Chitalia
Yinfei Yang
30
0
0
06 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
60
1
0
05 Apr 2025
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Wasi Uddin Ahmad
Aleksander Ficek
Mehrzad Samadi
Jocelyn Huang
Vahid Noroozi
Somshubra Majumdar
Boris Ginsburg
ALM
50
1
0
05 Apr 2025
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation
J. S. Park
Jinho Park
Dongju Jang
Jiwan Chung
Byungwoo Yoo
Jaewoo Shin
S. Park
Taehyeong Kim
Youngjae Yu
48
0
0
04 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
47
0
0
04 Apr 2025
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Bingxiang He
Wenbin Zhang
Jiaxi Song
Cheng Qian
Z. Fu
...
Hui Xue
Ganqu Cui
Wanxiang Che
Zhiyuan Liu
Maosong Sun
39
0
0
04 Apr 2025
Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty
Yu Inatsu
46
0
0
04 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRL
LRM
56
21
0
03 Apr 2025
Noiser: Bounded Input Perturbations for Attributing Large Language Models
Mohammad Reza Ghasemi Madani
Aryo Pradipta Gema
Gabriele Sarti
Yu Zhao
Pasquale Minervini
Andrea Passerini
AAML
40
0
0
03 Apr 2025
Cultural Learning-Based Culture Adaptation of Language Models
Chen Cecilia Liu
Anna Korhonen
Iryna Gurevych
48
0
0
03 Apr 2025
CoLa -- Learning to Interactively Collaborate with Large LMs
Abhishek Sharma
Dan Goldwasser
LLMAG
SyDa
72
0
0
03 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
79
0
0
03 Apr 2025
Previous
1
2
3
4
5
6
...
57
58
59
Next