Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05685
Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"
50 / 2,961 papers shown
Title
Effectively Steer LLM To Follow Preference via Building Confident Directions
Bingqing Song
Boran Han
Shuai Zhang
Hao Wang
Haoyang Fang
Bonan Min
Yuyang Wang
Mingyi Hong
LLMSV
57
0
0
04 Mar 2025
Towards Effective and Efficient Context-aware Nucleus Detection in Histopathology Whole Slide Images
Zhongyi Shui
Ruizhe Guo
Honglin Li
Yuxuan Sun
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Pingyi Chen
Yanzhou Su
Lin Yang
58
0
0
04 Mar 2025
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Gauthier Gidel
Leo Schwinn
Stephan Günnemann
ALM
ELM
287
0
0
04 Mar 2025
Improving LLM-as-a-Judge Inference with the Judgment Distribution
Victor Wang
Michael J.Q. Zhang
Eunsol Choi
72
1
0
04 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
52
1
0
04 Mar 2025
Don't Get Too Excited -- Eliciting Emotions in LLMs
Gino Franco Fazzi
Julie Skoven Hinge
Stefan Heinrich
Paolo Burelli
49
0
0
04 Mar 2025
OWLViz: An Open-World Benchmark for Visual Question Answering
T. Nguyen
Dang Nguyen
Hoang Nguyen
Thuan Luong
Long Hoang Dang
Viet Dac Lai
VLM
70
0
0
04 Mar 2025
Reflection on Data Storytelling Tools in the Generative AI Era from the Human-AI Collaboration Perspective
Haotian Li
Yuanbo Wang
Huamin Qu
43
0
0
04 Mar 2025
Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm
Zehan Li
Yuhao Du
Xiaoqi Jiao
Yiwen Guo
Yuege Feng
Xiang Wan
Anningzhe Gao
Jinpeng Hu
68
0
0
04 Mar 2025
Adversarial Tokenization
Renato Lui Geh
Zilei Shao
Guy Van den Broeck
SILM
AAML
92
0
0
04 Mar 2025
OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale
Haoyang Li
Shang Wu
Xiaokang Zhang
Xinmei Huang
Jing Zhang
...
Tieying Zhang
Jianjun Chen
Rui Shi
Hongyu Chen
C. Li
SyDa
85
6
0
04 Mar 2025
AutoEval: A Practical Framework for Autonomous Evaluation of Mobile Agents
Jiahui Sun
Zhichao Hua
Yubin Xia
50
0
0
04 Mar 2025
Llama-3.1-Sherkala-8B-Chat: An Open Large Language Model for Kazakh
Fajri Koto
Rituraj Joshi
Nurdaulet Mukhituly
Yunhong Wang
Zhuohan Xie
...
Avraham Sheinin
Natalia Vassilieva
Neha Sengupta
Larry Murray
Preslav Nakov
ALM
KELM
51
0
0
03 Mar 2025
Analyzing the Safety of Japanese Large Language Models in Stereotype-Triggering Prompts
Akito Nakanishi
Yukie Sano
Geng Liu
Francesco Pierri
60
0
0
03 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
284
0
0
03 Mar 2025
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom
Yisen Li
Lingfeng Yang
Wenxuan Shen
Pan Zhou
Yao Wan
Weiwei Lin
Danny Chen
80
0
0
03 Mar 2025
Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning
Wenjie Wu
Yongcheng Jing
Yingjie Wang
Wenbin Hu
Dacheng Tao
RALM
LRM
79
2
0
03 Mar 2025
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
Siddhant Arora
Zhiyun Lu
Chung-Cheng Chiu
Ruoming Pang
Shinji Watanabe
51
3
0
03 Mar 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou
Ke Mei
Yu Lu
Tianyi Wang
Fengyun Rao
94
2
0
03 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Yanzhe Zhang
Xiren Zhou
MoE
SyDa
78
32
0
03 Mar 2025
Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)
Jingtian Yan
Zhifei Li
William Kang
Yulun Zhang
Stephen Smith
Jiaoyang Li
53
0
0
03 Mar 2025
SwiLTra-Bench: The Swiss Legal Translation Benchmark
Joel Niklaus
Jakob Merane
Luka Nenadic
Sina Ahmadi
Yingqiang Gao
...
Matthew Guillod
Robin Mamié
Daniel Brunner
Julio Pereyra
Niko Grupen
AILaw
ELM
84
1
0
03 Mar 2025
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
Zhendong Wang
Jianmin Bao
Shuyang Gu
Dong Chen
Wengang Zhou
Haoyang Li
DiffM
53
0
0
03 Mar 2025
What do Large Language Models Say About Animals? Investigating Risks of Animal Harm in Generated Text
Arturs Kanepajs
Aditi Basu
Sankalpa Ghose
Constance Li
Akshat Mehta
Ronak Mehta
Samuel David Tucker-Davis
Eric Zhou
Bob Fischer
ALM
ELM
53
0
0
03 Mar 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang
Shaobin Zhuang
Canmiao Fu
Binxin Yang
Ying Zhang
Chong Sun
Zhizheng Zhang
Yali Wang
Chen Li
Zheng-Jun Zha
DiffM
74
2
0
03 Mar 2025
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Yun Wang
Pei Zhang
Siyuan Huang
Baosong Yang
Zizhuo Zhang
Fei Huang
Rui Wang
BDL
LRM
74
8
0
03 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
123
6
0
03 Mar 2025
Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
Miao Peng
Nuo Chen
Zongrui Suo
Jia Li
LRM
45
1
0
02 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
231
0
0
02 Mar 2025
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
Kai Lv
Honglin Guo
Qipeng Guo
Xipeng Qiu
46
0
0
02 Mar 2025
Evaluating Polish linguistic and cultural competency in large language models
Sławomir Dadas
Małgorzata Grębowiec
Michał Perełkiewicz
Rafał Poświata
ELM
52
2
0
02 Mar 2025
PodAgent: A Comprehensive Framework for Podcast Generation
Yujia Xiao
Lei He
Haohan Guo
Fenglong Xie
Tan Lee
263
0
0
01 Mar 2025
Distributionally Robust Reinforcement Learning with Human Feedback
Debmalya Mandal
Paulius Sasnauskas
Goran Radanović
44
1
0
01 Mar 2025
BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
Terry Tong
Fei Wang
Zhe Zhao
Mengzhao Chen
AAML
ELM
49
2
0
01 Mar 2025
Steer LLM Latents for Hallucination Detection
Seongheon Park
Xuefeng Du
Min-Hsuan Yeh
Haobo Wang
Yixuan Li
LLMSV
65
2
0
01 Mar 2025
Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy
Felix Dobslaw
R. Feldt
Juyeon Yoon
S. Yoo
51
0
0
01 Mar 2025
Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?
Perla Al Almaoui
Pierrette Bouillon
Simon Hengchen
67
0
0
28 Feb 2025
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
Sunghyeon Woo
Sol Namkung
Sunwoo Lee
Inho Jeong
Beomseok Kim
Dongsuk Jeon
45
0
0
28 Feb 2025
WorldModelBench: Judging Video Generation Models As World Models
Dacheng Li
Yunhao Fang
Yukang Chen
Shuo Yang
Shiyi Cao
...
Hongxu Yin
Joseph E. Gonzalez
Ion Stoica
Enze Xie
Yaojie Lu
VGen
60
4
0
28 Feb 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Hanjiang Hu
Alexander Robey
Changliu Liu
AAML
LLMSV
52
2
0
28 Feb 2025
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
J.N. Zhang
Xuan Yang
Tianfu Wang
Yu Yao
Aleksandr Petiushko
B. Li
45
0
0
28 Feb 2025
Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs
Fakhraddin Alwajih
Abdellah El Mekki
Samar Magdy
AbdelRahim Elmadany
Omer Nacar
...
Anis Koubaa
Ismail Berrada
Mustafa Jarrar
Shady Shehata
Muhammad Abdul-Mageed
55
2
0
28 Feb 2025
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer
Omer Goldman
Uri Shaham
Dan Malkin
Sivan Eiger
Avinatan Hassidim
...
Shruti Rijhwani
Laura Rimell
Idan Szpektor
Reut Tsarfaty
Matan Eyal
47
4
0
28 Feb 2025
Personalized Causal Graph Reasoning for LLMs: A Case Study on Dietary Recommendations
Zhongqi Yang
Amir M. Rahmani
46
0
0
28 Feb 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
62
8
0
27 Feb 2025
Long-Context Inference with Retrieval-Augmented Speculative Decoding
Guanzheng Chen
Qilong Feng
Jinjie Ni
Xin Li
Michael Shieh
RALM
60
2
0
27 Feb 2025
Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models
Heeseung Kim
Che Hyun Lee
S. Park
Jiheum Yeom
Nohil Park
Sangwon Yu
Sungroh Yoon
76
0
0
27 Feb 2025
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Shaharukh Khan
Ayush Tarun
Ali Faraz
Palash Kamble
Vivek Dahiya
Praveen Kumar Pokala
Ashish Kulkarni
Chandra Khatri
Abhinav Ravi
Shubham Agarwal
254
0
0
27 Feb 2025
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
Franck Cappello
Sandeep Madireddy
Robert Underwood
N. Getty
Nicholas Chia
...
M. Rafique
Eliu A. Huerta
Yangqiu Song
Ian Foster
Rick L. Stevens
84
1
0
27 Feb 2025
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
Mian Zhang
S. Eack
Zhiyu Zoey Chen
84
1
0
27 Feb 2025
Previous
1
2
3
...
9
10
11
...
58
59
60
Next