
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
Dacheng Li
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
Papers citing "Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference"
50 / 340 papers shown
Title |
---|
![]() Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large
Language Models Bofei Gao Feifan Song Zhiyong Yang Zefan Cai Yibo Miao ...Lei Sha Yichang Zhang Xuancheng Ren Tianyu Liu Baobao Chang |
![]() An evaluation of LLM code generation capabilities through graded
exercises Álvaro Barbero Jiménez |
![]() CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring
the (Lack of) Cultural Knowledge of LLMs Yu Ying Chiu Liwei Jiang Bill Yuchen Lin Chan Young Park Shuyue Stella Li ...Mehar Bhatia Maria Antoniak Yulia Tsvetkov Vered Shwartz Yejin Choi |