Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.04132
Cited By
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
7 March 2024
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
Dacheng Li
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
OSLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference"
40 / 340 papers shown
Title
Large Language Models Can Self-Improve At Web Agent Tasks
Ajay Patel
M. Hofmarcher
Claudiu Leoveanu-Condrei
Marius-Constantin Dinu
Chris Callison-Burch
Sepp Hochreiter
LLMAG
50
26
0
30 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
68
388
0
23 May 2024
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Jingnan Zheng
Han Wang
An Zhang
Tai D. Nguyen
Jun Sun
Tat-Seng Chua
LLMAG
51
14
0
23 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
72
48
0
23 May 2024
SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
Xingzhou Lou
Junge Zhang
Jian Xie
Lifeng Liu
Dong Yan
Kaiqi Huang
54
11
0
21 May 2024
WisPerMed at "Discharge Me!": Advancing Text Generation in Healthcare with Large Language Models, Dynamic Expert Selection, and Priming Techniques on MIMIC-IV
Hendrik Damm
T. M. G. Pakull
Bahadir Eryilmaz
Helmut Becker
Ahmad Idrissi-Yaghir
Henning Schafer
Sergej Schultenkämper
Christoph M. Friedrich
36
3
0
18 May 2024
Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation
Julia Barnett
Kimon Kieslich
Nicholas Diakopoulos
45
3
0
15 May 2024
Understanding the performance gap between online and offline alignment algorithms
Yunhao Tang
Daniel Guo
Zeyu Zheng
Daniele Calandriello
Yuan Cao
...
Rémi Munos
Bernardo Avila-Pires
Michal Valko
Yong Cheng
Will Dabney
OffRL
OnRL
46
62
0
14 May 2024
FlockGPT: Guiding UAV Flocking with Linguistic Orchestration
Artem Lykov
Sausar Karaf
Mikhail Martynov
Valerii Serpiva
A. Fedoseev
Mikhail Konenkov
Dzmitry Tsetserukou
43
9
0
09 May 2024
Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning
Artem Lykov
Miguel Altamirano Cabrera
Koffivi Fidele Gbagbe
Dzmitry Tsetserukou
44
1
0
09 May 2024
SUTRA: Scalable Multilingual Language Model Architecture
Abhijit Bendale
Michael Sapienza
Steven Ripplinger
Simon Gibbs
Jaewon Lee
Pranav Mistry
LRM
ELM
41
4
0
07 May 2024
CACTUS: Chemistry Agent Connecting Tool-Usage to Science
Andrew D. McNaughton
Gautham Ramalaxmi
Agustin Kruel
C. Knutson
R. Varikoti
Neeraj Kumar
63
8
0
02 May 2024
What is Reproducibility in Artificial Intelligence and Machine Learning Research?
Abhyuday Desai
Mohamed Abdelhamid
N. R. Padalkar
AI4CE
32
2
0
29 Apr 2024
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
Giorgio Piatti
Zhijing Jin
Max Kleiman-Weiner
Bernhard Schölkopf
Mrinmaya Sachan
Rada Mihalcea
LLMAG
65
15
0
25 Apr 2024
PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models
Shashi Kant Gupta
Aditya Basu
Mauro Nievas
Jerrin Thomas
Nathan Wolfrath
...
Regina Schwind
Therica M. Miller
Sorena Nadaf-Rahrov
Yanshan Wang
Hrituraj Singh
LM&MA
55
8
0
23 Apr 2024
Evaluating Large Language Models for Material Selection
Daniele Grandi
Yash Jain
Allin Groom
Brandon Cramer
Christopher McComb
42
8
0
23 Apr 2024
Does Instruction Tuning Make LLMs More Consistent?
Constanza Fierro
Jiaang Li
Anders Sogaard
LRM
45
2
0
23 Apr 2024
Apollonion: Profile-centric Dialog Agent
Shangyu Chen
Zibo Zhao
Yuanyuan Zhao
Xiang Li
LLMAG
45
1
0
10 Apr 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Kede Ma
Zhihua Wang
Qiang Zhang
Huajun Chen
47
10
0
10 Apr 2024
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Yu Ying Chiu
Amirhossein Ajalloeian
Maria Antoniak
Chan Young Park
Shuyue Stella Li
Mehar Bhatia
Sahithya Ravi
Yulia Tsvetkov
Vered Shwartz
Yejin Choi
47
20
0
10 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
69
33
0
08 Apr 2024
The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
Hussein Mozannar
Valerie Chen
Mohammed Alsobay
Subhro Das
Sebastian Zhao
Dennis L. Wei
Manish Nagireddy
P. Sattigeri
Ameet Talwalkar
David Sontag
ELM
51
18
0
03 Apr 2024
sDPO: Don't Use Your Data All at Once
Dahyun Kim
Yungi Kim
Wonho Song
Hyeonwoo Kim
Yunsu Kim
Sanghoon Kim
Chanjun Park
36
31
0
28 Mar 2024
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Orion Weller
Benjamin Chang
Sean MacAvaney
Kyle Lo
Arman Cohan
Benjamin Van Durme
Dawn J Lawrie
Luca Soldaini
63
30
0
22 Mar 2024
AutoEval Done Right: Using Synthetic Data for Model Evaluation
Pierre Boyeau
Anastasios Nikolas Angelopoulos
N. Yosef
Jitendra Malik
Michael I. Jordan
SyDa
49
14
0
09 Mar 2024
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
Yuan Ge
Yilun Liu
Chi Hu
Weibin Meng
Shimin Tao
Xiaofeng Zhao
Hongxia Ma
Li Zhang
Hao Yang
Tong Xiao
ALM
47
30
0
28 Feb 2024
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
50
5
0
27 Feb 2024
Test-Driven Development for Code Generation
N. Mathews
Mei Nagappan
57
7
0
21 Feb 2024
AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy
P. Schoenegger
Peter S. Park
Ezra Karger
P. Tetlock
50
14
0
12 Feb 2024
Conditional and Modal Reasoning in Large Language Models
Wesley H. Holliday
M. Mandelkern
Cedegao E. Zhang
LRM
42
5
0
30 Jan 2024
Do LLMs Dream of Ontologies?
Marco Bombieri
Paolo Fiorini
Simone Paolo Ponzetto
M. Rospocher
CLL
45
2
0
26 Jan 2024
Enhancing Recommendation Diversity by Re-ranking with Large Language Models
Diego Carraro
Derek Bridge
LRM
ALM
67
15
0
21 Jan 2024
Prompt Valuation Based on Shapley Values
Hanxi Liu
Xiaokai Mao
Haocheng Xia
Jian Lou
Jinfei Liu
45
4
0
24 Dec 2023
SGLang: Efficient Execution of Structured Language Model Programs
Lianmin Zheng
Liangsheng Yin
Zhiqiang Xie
Chuyue Sun
Jeff Huang
...
Christos Kozyrakis
Ion Stoica
Joseph E. Gonzalez
Clark W. Barrett
Ying Sheng
LRM
47
121
0
12 Dec 2023
Novel Preprocessing Technique for Data Embedding in Engineering Code Generation Using Large Language Model
Yu-Chen Lin
Akhilesh Kumar
Norman Chang
Wen-Liang Zhang
Muhammad Zakir
Rucha Apte
Haiyang He
Chao Wang
Jyh-Shing Roger Jang
35
3
0
27 Nov 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
229
586
0
03 May 2023
Game-theoretic statistics and safe anytime-valid inference
Aaditya Ramdas
Peter Grünwald
V. Vovk
Glenn Shafer
64
123
0
04 Oct 2022
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
228
107
0
14 Sep 2021
Online Active Model Selection for Pre-trained Classifiers
Mohammad Reza Karimi
Nezihe Merve Gürel
Bojan Karlavs
Johannes Rausch
Ce Zhang
Andreas Krause
36
22
0
19 Oct 2020
Estimating means of bounded random variables by betting
Ian Waudby-Smith
Aaditya Ramdas
61
151
0
19 Oct 2020
Previous
1
2
3
4
5
6
7