Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.11085
Cited By
v1
v2
v3 (latest)
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
17 March 2024
Zixian Ma
Weikai Huang
Jieyu Zhang
Tanmay Gupta
Ranjay Krishna
Re-assign community
ArXiv (abs)
PDF
HTML
Github (39★)
Papers citing
"m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks"
4 / 4 papers shown
Title
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use
Zaid Khan
Ali Farhadi
Ranjay Krishna
Luca Weihs
Joey Tianyi Zhou
Tanmay Gupta
82
0
0
21 Feb 2025
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan
Elias Stengel-Eskin
Jaemin Cho
Joey Tianyi Zhou
VGen
185
3
0
08 Oct 2024
Adaptive In-conversation Team Building for Language Model Agents
Linxin Song
Jiale Liu
Jieyu Zhang
Shaokun Zhang
Ao Luo
Shijian Wang
Qingyun Wu
Chi Wang
LLMAG
162
14
0
29 May 2024
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
Samuel Schmidgall
Rojin Ziaei
Carl Harris
Eduardo Reis
Jeffrey Jopling
Michael Moor
256
55
0
13 May 2024
1