Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05685
Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"
50 / 2,961 papers shown
Title
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
Ruihan Yang
Fanghua Ye
Jian Li
Siyu Yuan
Yikai Zhang
Zhaopeng Tu
Xiaolong Li
Deqing Yang
LLMAG
78
4
0
20 Mar 2025
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
Yifan Sun
Han Wang
Dongbai Li
Gang Wang
Huan Zhang
AAML
66
0
0
20 Mar 2025
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
Aly M. Kassem
Bernhard Schölkopf
Zhijing Jin
31
0
0
20 Mar 2025
Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient's Point of View
Mathilde Aguiar
Pierre Zweigenbaum
Nona Naderi
LM&MA
51
0
0
19 Mar 2025
VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning
Y. Tan
Chen Liu
Jingyuan Gao
Banghao Wu
Mingchen Li
...
Lingrong Zhang
Huiqun Yu
Guisheng Fan
Liang Hong
Bingxin Zhou
63
1
0
19 Mar 2025
Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
Shuo Li
Jiajun Sun
Guodong Zheng
Xiaoran Fan
Yujiong Shen
...
Wenming Tan
Tao Ji
Tao Gui
Qi Zhang
Xuanjing Huang
AAML
VLM
95
1
0
19 Mar 2025
A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology
Yi Luo
H. Hooshangnejad
Xue Feng
Gaofeng Huang
Xiao Chen
Rui Zhang
Quan Chen
Wil Ngwa
Kai Ding
65
0
0
19 Mar 2025
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang
Munan Ning
Zheyuan Liu
Yanbo Wang
Jiayi Ye
Yue Huang
Shuo Yang
Xiao Chen
Y. Song
Li Yuan
LRM
67
0
0
19 Mar 2025
Where do Large Vision-Language Models Look at when Answering Questions?
X. Xing
Chia-Wen Kuo
Li Fuxin
Yulei Niu
Fan Chen
Ming Li
Ying Wu
Longyin Wen
Sijie Zhu
LRM
67
0
0
18 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
57
1
0
18 Mar 2025
Synthetic Clarification and Correction Dialogues about Data-Centric Tasks -- A Teacher-Student Approach
Christian Poelitz
Nick McKenna
54
1
0
18 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
91
4
0
18 Mar 2025
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
Donggon Jang
Yucheol Cho
Suin Lee
Taehyeon Kim
Dae-Shik Kim
VLM
70
1
0
18 Mar 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
68
0
0
18 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
88
1
0
17 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Wan Ju Kang
Eunki Kim
Na Min An
Sangryul Kim
Haemin Choi
Ki Hoon Kwak
James Thorne
54
0
0
17 Mar 2025
Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs
Jasmin Wachter
Michael Radloff
Maja Smolej
Katharina Kinder-Kurlanda
49
0
0
17 Mar 2025
Can Language Models Follow Multiple Turns of Entangled Instructions?
Chi Han
ELM
LRM
55
1
0
17 Mar 2025
A Survey on the Optimization of Large Language Model-based Agents
Shangheng Du
Jiabao Zhao
Jinxin Shi
Zhentao Xie
Xin Jiang
Yanhong Bai
Liang He
LLMAG
LM&Ro
LM&MA
319
1
0
16 Mar 2025
Unveiling Pitfalls: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution
Zhi Chen
Wei Ma
Lingxiao Jiang
LLMAG
58
0
0
16 Mar 2025
From Demonstrations to Rewards: Alignment Without Explicit Human Preferences
Siliang Zeng
Yao Liu
Huzefa Rangwala
George Karypis
Mingyi Hong
Rasool Fakoor
57
2
0
15 Mar 2025
A Survey on Federated Fine-tuning of Large Language Models
Yebo Wu
Chunlin Tian
Jingguang Li
He Sun
Kahou Tam
Li Li
Chengzhong Xu
FedML
86
0
0
15 Mar 2025
Cross-Modal Learning for Music-to-Music-Video Description Generation
Zhuoyuan Mao
Mengjie Zhao
Qiyu Wu
Zhi-Wei Zhong
Wei-Hsiang Liao
Hiromi Wakaki
Yuki Mitsufuji
DiffM
VGen
87
0
0
14 Mar 2025
Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring
Kezia Oketch
John P. Lalor
Yi Yang
Ahmed Abbasi
ELM
57
1
0
14 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
59
1
0
14 Mar 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELM
LRM
62
0
0
14 Mar 2025
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Chuhan Zhang
Chaoyang Zhu
Pingcheng Dong
Long Chen
Dong Zhang
ObjD
VLM
254
0
0
14 Mar 2025
D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning
Jia Zhang
Chen-Xi Zhang
Yang Liu
Yi-Xuan Jin
Xiao-Wen Yang
Bo Zheng
Yi Liu
Lan-Zhe Guo
54
2
0
14 Mar 2025
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
Shunqi Mao
Chaoyi Zhang
Weidong Cai
MLLM
241
0
0
13 Mar 2025
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1
Zhaoyi Li
Xiaohan Zhao
Dong-Dong Wu
Jiacheng Cui
Zhiqiang Shen
AAML
VLM
77
1
0
13 Mar 2025
SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable
Jiaxin Zhang
Zechao Li
Wendi Cui
Kamalika Das
Bradley Malin
Sricharan Kumar
54
0
0
13 Mar 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Luke M. Guerdan
Solon Barocas
Kenneth Holstein
Hanna M. Wallach
Zhiwei Steven Wu
Alexandra Chouldechova
ALM
ELM
320
0
0
13 Mar 2025
Source-primed Multi-turn Conversation Helps Large Language Models Translate Documents
Hanxu Hu
Jannis Vamvas
Rico Sennrich
55
0
0
13 Mar 2025
Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark
Viktor Moskvoretskii
Alina Lobanova
Ekaterina Neminova
Chris Biemann
Alexander Panchenko
Irina Nikishina
52
0
0
13 Mar 2025
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
So Young Lee
Russell Scheinberg
Amber Shore
Ameeta Agrawal
58
1
0
13 Mar 2025
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
Jiajun Li
Yixing Xu
Haiduo Huang
Xuanwu Yin
D. Li
Edith C. -H. Ngai
E. Barsoum
63
0
0
13 Mar 2025
Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts
Hongyu Chen
Seraphina Goldfarb-Tarrant
50
0
0
12 Mar 2025
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Ruohao Guo
Wei Xu
Alan Ritter
44
1
0
12 Mar 2025
Large Language Models-Aided Program Debloating
Bo Lin
Shangwen Wang
Yihao Qin
Liqian Chen
Xiaoguang Mao
60
0
0
12 Mar 2025
DAVE: Diagnostic benchmark for Audio Visual Evaluation
Gorjan Radevski
Teodora Popordanoska
Matthew B. Blaschko
Tinne Tuytelaars
63
0
0
12 Mar 2025
Conversational Gold: Evaluating Personalized Conversational Search System using Gold Nuggets
Zahra Abbasiantaeb
Simon Lupart
Leif Azzopardi
Jeffery Dalton
Mohammad Aliannejadi
RALM
65
1
0
12 Mar 2025
RigoChat 2: an adapted language model to Spanish using a bounded dataset and reduced hardware
Gonzalo Santamaría Gómez
Guillem García Subies
Pablo Gutiérrez Ruiz
Mario González Valero
Natàlia Fuertes
...
Nuria Aldama García
David Betancur Sánchez
Kateryna Sushkova
Marta Guerrero Nieto
Á. Jiménez
56
0
0
11 Mar 2025
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents
Haoyu Wang
Sunhao Dai
Haiyuan Zhao
Liang Pang
Xiao Zhang
Gang Wang
Zhenhua Dong
Jun Xu
Ji-Rong Wen
77
2
0
11 Mar 2025
Graph of AI Ideas: Leveraging Knowledge Graphs and LLMs for AI Research Idea Generation
Xian Gao
Zongyun Zhang
Mingye Xie
Ting Liu
Yuzhuo Fu
42
0
0
11 Mar 2025
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
Parishad BehnamGhader
Nicholas Meade
Siva Reddy
67
1
0
11 Mar 2025
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering
Sher Badshah
Hassan Sajjad
73
1
0
11 Mar 2025
Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations
Ishani Mondal
Jack W. Stokes
S. Jauhar
Longqi Yang
Mengting Wan
Xiaofeng Xu
Xia Song
Jennifer Neville
57
0
0
11 Mar 2025
Counterfactual Language Reasoning for Explainable Recommendation Systems
Ge Li
Haolin Yang
Xinyu Liu
Zhen Wu
Xinyu Dai
LRM
CML
50
0
0
11 Mar 2025
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
Zhiyuan Zeng
Yizhong Wang
Hannaneh Hajishirzi
Pang Wei Koh
ELM
66
6
0
11 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad
Vibhav Vineet
Yogesh S Rawat
VLM
241
1
0
11 Mar 2025
Previous
1
2
3
...
7
8
9
...
58
59
60
Next