Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.04132
Cited By
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
7 March 2024
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
Dacheng Li
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
OSLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference"
50 / 340 papers shown
Title
What Are They Filtering Out? A Survey of Filtering Strategies for Harm Reduction in Pretraining Datasets
Marco Antonio Stranisci
Christian Hardmeier
80
0
0
17 Feb 2025
Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context Learning
Yilei Tu
Andrew Xue
Freda Shi
53
0
0
17 Feb 2025
Leveraging Uncertainty Estimation for Efficient LLM Routing
Tuo Zhang
Asal Mehradfar
Dimitrios Dimitriadis
Salman Avestimehr
72
1
0
16 Feb 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou
Yuda Song
Andrea Zanette
ALM
89
0
0
14 Feb 2025
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References
Xueyi Liu
Jianibieke Adalibieke
Qianwei Han
Yuzhe Qin
Li Yi
109
3
0
13 Feb 2025
Improving Existing Optimization Algorithms with LLMs
Camilo Chacón Sartori
Christian Blum
55
1
0
12 Feb 2025
Hookpad Aria: A Copilot for Songwriters
Chris Donahue
Shih-Lun Wu
Yewon Kim
Dave Carlton
Ryan Miyakawa
John Thickstun
76
1
0
12 Feb 2025
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang
Wenhao Zhu
Hanxu Hu
Zeang Sheng
Lei Li
Shujian Huang
Fei Yuan
ELM
75
4
0
11 Feb 2025
Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
Atharva Mehta
Shivam Chauhan
Amirbek Djanibekov
Atharva Kulkarni
Gus Xia
Monojit Choudhury
69
0
0
11 Feb 2025
AI Alignment at Your Discretion
Maarten Buyl
Hadi Khalaf
C. M. Verdun
Lucas Monteiro Paes
Caio Vieira Machado
Flavio du Pin Calmon
57
0
0
10 Feb 2025
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Anna Arias-Duart
Pablo A. Martin-Torres
Daniel Hinjos
Pablo Bernabeu Perez
Lucia Urcelay-Ganzabal
Marta Gonzalez-Mallo
Ashwin Kumar Gururajan
Enrique Lopez-Cuena
Sergio Alvarez-Napagao
Dario Garcia-Gasulla
LM&MA
ELM
133
1
0
10 Feb 2025
Enabling Autoregressive Models to Fill In Masked Tokens
Daniel Israel
Aditya Grover
Guy Van den Broeck
AI4CE
68
1
0
09 Feb 2025
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation
Quinn Dougherty
Ronak Mehta
ALM
64
1
0
08 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
58
0
0
03 Feb 2025
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Rui Min
Tianyu Pang
Chao Du
Qian Liu
Minhao Cheng
Min Lin
AAML
62
4
0
29 Jan 2025
SedarEval: Automated Evaluation using Self-Adaptive Rubrics
Zhiyuan Fan
Weinong Wang
Xing Wu
Debing Zhang
41
1
0
28 Jan 2025
DeServe: Towards Affordable Offline LLM Inference via Decentralization
Linyu Wu
Xiaoyuan Liu
Tianneng Shi
Zhe Ye
D. Song
OffRL
73
0
0
28 Jan 2025
MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models
Zhongpu Chen
Yixiao Liu
Long Shi
Zhi-Jie Wang
Xingyan Chen
Yu Zhao
Fuji Ren
58
1
0
28 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Wentao Zhang
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
103
19
0
21 Jan 2025
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
Khanh-Tung Tran
Dung Dao
Minh-Duong Nguyen
Quoc-Viet Pham
Barry O'Sullivan
Hoang D. Nguyen
LLMAG
103
32
0
10 Jan 2025
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models
Hao Li
Cor-Paul Bezemer
Ahmed E. Hassan
50
2
0
08 Jan 2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
SILM
81
9
0
08 Jan 2025
Accounting for Focus Ambiguity in Visual Questions
Chongyan Chen
Yu-Yun Tseng
Zhuoheng Li
Anush Venkatesh
Danna Gurari
59
0
0
04 Jan 2025
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
Stefan Pasch
68
0
0
04 Jan 2025
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection
Yachao Zhao
Bo Wang
Yan Wang
65
3
0
04 Jan 2025
Real-time Fake News from Adversarial Feedback
Sanxing Chen
Yukun Huang
Bhuwan Dhingra
44
0
0
31 Dec 2024
Towards Effective Discrimination Testing for Generative AI
Thomas P. Zollo
Nikita Rajaneesh
Richard Zemel
Talia B. Gillis
Emily Black
86
1
0
31 Dec 2024
A Statistical Framework for Ranking LLM-Based Chatbots
Siavash Ameli
Siyuan Zhuang
Ion Stoica
Michael W. Mahoney
ELM
63
1
0
24 Dec 2024
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
Huawen Feng
Pu Zhao
Qingfeng Sun
Can Xu
Fangkai Yang
...
Qianli Ma
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
AAML
ALM
69
0
0
23 Dec 2024
NILE: Internal Consistency Alignment in Large Language Models
Minda Hu
Qiyuan Zhang
Yufei Wang
Bowei He
Hongru Wang
Jingyan Zhou
Liangyou Li
Yasheng Wang
Chen Ma
Irwin King
105
0
0
21 Dec 2024
Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying
Federico Castagna
I. Sassoon
Simon Parsons
LRM
94
0
0
19 Dec 2024
Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games
Wenye Lin
Jonathan Roberts
Yunhan Yang
Samuel Albanie
Zongqing Lu
Kai Han
LRM
ELM
95
1
0
18 Dec 2024
Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement
Qianyue Wang
Jinwu Hu
Zhengping Li
Yufeng Wang
daiyuan li
Yu Hu
Mingkui Tan
106
4
0
18 Dec 2024
Why Does ChatGPT "Delve" So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models
Tom S. Juzek
Zina B. Ward
99
1
0
16 Dec 2024
Generics are puzzling. Can language models find the missing piece?
Gustavo Cilleruelo Calderón
Emily Allaway
Barry Haddow
Alexandra Birch
83
0
0
15 Dec 2024
Reliable, Reproducible, and Really Fast Leaderboards with Evalica
Dmitry Ustalov
ALM
ELM
94
0
0
15 Dec 2024
RecSys Arena: Pair-wise Recommender System Evaluation with Large Language Models
Zhuo Wu
Qinglin Jia
Chuhan Wu
Zhaocheng Du
Shuai Wang
Zihan Wang
Zhenhua Dong
OffRL
79
0
0
15 Dec 2024
Cultural Evolution of Cooperation among LLM Agents
Aron Vallinder
Edward Hughes
LLMAG
114
5
0
13 Dec 2024
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Joey Tianyi Zhou
103
0
0
10 Dec 2024
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
Lei Wang
Jianxun Lian
Yi Huang
Yanqi Dai
Haoxuan Li
Xu Chen
Xing Xie
Ji-Rong Wen
LLMAG
99
3
0
07 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
138
8
0
05 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
161
5
0
04 Dec 2024
Optimizing Large Language Models for Turkish: New Methodologies in Corpus Selection and Training
Himmet Toprak Kesgin
M. K. Yuce
Eren Dogan
M. E. Uzun
Atahan Uz
Elif Ince
Yusuf Erdem
Osama Shbib
Ahmed Zeer
M. Fatih Amasyali
79
0
0
03 Dec 2024
Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: Görselle Sohbet Etmek
Ahmed Zeer
Eren Dogan
Yusuf Erdem
Elif Ince
Osama Shbib
M. E. Uzun
Atahan Uz
M. K. Yuce
Himmet Toprak Kesgin
M. Fatih Amasyali
VLM
91
0
0
03 Dec 2024
The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?
Sourav Banerjee
Ayushi Agarwal
Eishkaran Singh
ELM
93
2
0
02 Dec 2024
Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor
Ashwin Baluja
80
3
0
01 Dec 2024
A Flexible Defense Against the Winner's Curse
Tijana Zrnic
William Fithian
77
1
0
27 Nov 2024
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld
Emery Cooper
Miles Kodama
Linh Chi Nguyen
Ethan Perez
59
1
0
15 Nov 2024
Stronger Models are NOT Stronger Teachers for Instruction Tuning
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
ALM
63
6
0
11 Nov 2024
LLM-GLOBE: A Benchmark Evaluating the Cultural Values Embedded in LLM Output
Elise Karinshak
Amanda Hu
Kewen Kong
Vishwanatha Rao
Jingren Wang
Jindong Wang
Yi Zeng
73
2
0
09 Nov 2024
Previous
1
2
3
4
5
6
7
Next