Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.18901
Cited By
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
26 July 2024
H. Trivedi
Tushar Khot
Mareike Hartmann
R. Manku
Vinty Dong
Edward Li
Shashank Gupta
Ashish Sabharwal
Niranjan Balasubramanian
VGen
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents"
9 / 9 papers shown
Title
lmgame-Bench: How Good are LLMs at Playing Games?
Lanxiang Hu
Mingjia Huo
Yu Zhang
Haoyang Yu
Eric P. Xing
Ion Stoica
Tajana Rosing
Haojian Jin
Hao Zhang
58
1
0
21 May 2025
TALES: Text Adventure Learning Environment Suite
Christopher Zhang Cui
Xingdi Yuan
Ziang Xiao
Prithviraj Ammanabrolu
Marc-Alexandre Côté
LLMAG
LRM
71
2
0
19 Apr 2025
TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials
Bofei Zhang
Zirui Shang
Zhi Gao
Wang Zhang
Rui Xie
Xiaojian Ma
Tao Yuan
Xinxiao Wu
Song-Chun Zhu
Qing Li
LLMAG
62
3
0
17 Apr 2025
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Peijie Yu
Yifan Yang
Jiajian Li
Zelong Zhang
Haorui Wang
Xiao Feng
Feng Zhang
LLMAG
141
2
0
03 Apr 2025
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
Presented at
ResearchTrend Connect | LLMAG
on
23 Apr 2025
164
18
0
17 Mar 2025
Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity
Yupu Hao
Pengfei Cao
Zhuoran Jin
Huanxuan Liao
Yubo Chen
Kang Liu
Jun Zhao
LLMAG
292
1
0
02 Mar 2025
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Frank F. Xu
Yufan Song
Boxuan Li
Yuxuan Tang
Kritanjali Jain
...
Wayne Chi
Lawrence Jang
Yiqing Xie
Shuyan Zhou
Graham Neubig
LLMAG
154
29
0
18 Dec 2024
Beyond Browsing: API-Based Web Agents
Yueqi Song
Frank F. Xu
Shuyan Zhou
Graham Neubig
74
16
0
21 Oct 2024
Tur[k]ingBench: A Challenge Benchmark for Web Agents
Kevin Xu
Yeganeh Kordi
Kate Sanders
Yizhong Wang
Adam Byerly
Kate Sanders
Adam Byerly
Jingyu Zhang
Benjamin Van Durme
Daniel Khashabi
LLMAG
82
6
0
18 Mar 2024
1