Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.03302
Cited By
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
5 October 2023
Qian Huang
Jian Vora
Percy Liang
J. Leskovec
ELM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation"
23 / 23 papers shown
Title
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
Rushi Qiang
Yuchen Zhuang
Yinghao Li
D. Kilman
Rongzhi Zhang
...
Ian Shu-Hei Wong
Sherry Yang
Percy Liang
Chao Zhang
Bo Dai
ELM
41
0
0
12 May 2025
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey
Da Zheng
Lun Du
Junwei Su
Yuchen Tian
Yuqi Zhu
Jintian Zhang
Lanning Wei
Ningyu Zhang
H. Chen
LRM
61
0
0
06 May 2025
ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies
Shubham Gandhi
Dhruv Shah
Manasi S. Patwardhan
L. Vig
Gautam M. Shroff
LLMAG
AI4CE
146
0
0
28 Apr 2025
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo
Jinheon Baek
Seongyun Lee
Sung Ju Hwang
AI4CE
44
0
0
24 Apr 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
R. Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
78
0
0
15 Apr 2025
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?
Yunxiang Zhang
Muhammad Khalifa
Shitanshu Bhushan
Grant D Murphy
Lajanugen Logeswaran
Jaekyeom Kim
Moontae Lee
Honglak Lee
Lu Wang
LLMAG
ELM
64
0
0
13 Apr 2025
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
Tengjun Jin
Yuxuan Zhu
Daniel Kang
LMTD
ELM
47
0
0
07 Apr 2025
PaperBench: Evaluating AI's Ability to Replicate AI Research
Giulio Starace
Oliver Jaffe
Dane Sherburn
James Aung
Jun Shern Chan
...
Benjamin Kinsella
Wyatt Thompson
Johannes Heidecke
Amelia Glaese
Tejal Patwardhan
ALM
ELM
802
7
0
02 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
61
1
0
31 Mar 2025
Measuring AI Ability to Complete Long Tasks
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
...
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
82
6
0
18 Mar 2025
SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
Xiangchao Yan
Shiyang Feng
Jiakang Yuan
Renqiu Xia
Bin Wang
Bo Zhang
Junlin Wu
60
2
0
06 Mar 2025
AIDE: AI-Driven Exploration in the Space of Code
Zhengyao Jiang
Dominik Schmidt
Dhruv Srikanth
Dixing Xu
Ian Kaplan
Deniss Jacenko
Yuxiang Wu
69
5
0
18 Feb 2025
DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration
Sizhe Liu
Yaojie Lu
Siyu Chen
Xiyang Hu
Jieyu Zhao
Tianfan Fu
Yue Zhao
LLMAG
81
6
0
24 Nov 2024
AAAR-1.0: Assessing AI's Potential to Assist Research
Renze Lou
Hanzi Xu
Sijia Wang
Jiangshu Du
Ryo Kamoi
...
Xi Li
Kaipeng Zhang
Congying Xia
Lifu Huang
Wenpeng Yin
37
5
0
29 Oct 2024
Automating Traffic Model Enhancement with AI Research Agent
Xusen Guo
Xinxi Yang
Mingxing Peng
Hongliang Lu
Meixin Zhu
Hai Yang
62
1
0
25 Sep 2024
AI Agents That Matter
Sayash Kapoor
Benedikt Stroebl
Zachary S. Siegel
Nitya Nadgir
Arvind Narayanan
49
36
0
01 Jul 2024
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Peter Alexander Jansen
Marc-Alexandre Côté
Tushar Khot
Erin Bransom
Bhavana Dalvi Mishra
Bodhisattwa Prasad Majumder
Oyvind Tafjord
Peter Clark
LLMAG
43
21
0
10 Jun 2024
SciMON: Scientific Inspiration Machines Optimized for Novelty
Qingyun Wang
Doug Downey
Heng Ji
Tom Hope
LLMAG
37
61
0
23 May 2023
AutoML-GPT: Automatic Machine Learning with GPT
Shujian Zhang
Chengyue Gong
Lemeng Wu
Xingchao Liu
Mi Zhou
LLMAG
67
60
0
04 May 2023
Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems
Stefan Kramer
Mattia Cerrato
S. Džeroski
R. King
26
10
0
03 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
232
1,742
0
07 Apr 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
267
2,494
0
06 Oct 2022
The CLRS Algorithmic Reasoning Benchmark
Petar Velivcković
Adria Puigdomenech Badia
David Budden
Razvan Pascanu
Andrea Banino
Mikhail Dashevskiy
R. Hadsell
Charles Blundell
163
88
0
31 May 2022
1