ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.05507
  4. Cited By
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

10 January 2024
Xueyu Hu
Ziyu Zhao
Shuang Wei
Ziwei Chai
Qianli Ma
Guoyin Wang
Xuwu Wang
Jing Su
Jingjing Xu
Ming Zhu
Yao Cheng
Jianbo Yuan
Jiwei Li
Kun Kuang
Yang Yang
Hongxia Yang
Fei Wu
    LMTD
    ELM
ArXivPDFHTML

Papers citing "InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks"

43 / 43 papers shown
Title
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRL
VLM
41
0
0
06 May 2025
Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale
Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale
Jiale Liu
Yifan Zeng
Shaokun Zhang
Chi Zhang
Malte Højmark-Bertelsen
Marie Normann Gadeberg
H. Wang
Qingyun Wu
41
0
0
06 May 2025
Towards Automated Scoping of AI for Social Good Projects
Towards Automated Scoping of AI for Social Good Projects
Jacob Emmerson
Rayid Ghani
Zheyuan Ryan Shi
142
0
0
28 Apr 2025
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
Yuhang Liu
Pengxiang Li
C. Xie
Xavier Hu
Xiaotian Han
Shengyu Zhang
Hongxia Yang
Fei Wu
LLMAG
LM&Ro
LRM
AI4CE
72
2
0
19 Apr 2025
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation
Haokun Liu
Sicong Huang
Jingyu Hu
Yangqiaoyu Zhou
Chenhao Tan
32
0
0
15 Apr 2025
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
Amirhossein Abaskohi
A. Ramesh
Shailesh Nanisetty
Chirag Goel
David Vazquez
Christopher Pal
Spandana Gella
Giuseppe Carenini
I. Laradji
39
0
0
10 Apr 2025
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
Tengjun Jin
Yuxuan Zhu
Daniel Kang
LMTD
ELM
47
0
0
07 Apr 2025
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors
Zhiyu Yang
Shuo Wang
Yukun Yan
Yang Deng
31
0
0
28 Mar 2025
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Sky CH-Wang
Darshan Deshpande
Smaranda Muresan
Anand Kannappan
Rebecca Qian
59
1
0
24 Mar 2025
AgentRxiv: Towards Collaborative Autonomous Research
AgentRxiv: Towards Collaborative Autonomous Research
Samuel Schmidgall
Michael Moor
68
3
0
23 Mar 2025
Won: Establishing Best Practices for Korean Financial NLP
Won: Establishing Best Practices for Korean Financial NLP
Guijin Son
Hyunwoo Ko
Haneral Jung
Chami Hwang
49
0
0
23 Mar 2025
DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science
Ziming You
Yumiao Zhang
Dexuan Xu
Yiwei Lou
Yandong Yan
Wei Wang
H. Zhang
Yu Huang
LLMAG
62
0
0
10 Mar 2025
Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets
Tommaso Bendinelli
Artur Dox
Christian Holz
LLMAG
76
0
0
09 Mar 2025
StatLLM: A Dataset for Evaluating the Performance of Large Language Models in Statistical Analysis
StatLLM: A Dataset for Evaluating the Performance of Large Language Models in Statistical Analysis
Xinyi Song
Lina Lee
Kexin Xie
Xueying Liu
Xinwei Deng
Yili Hong
ALM
ELM
157
0
0
24 Feb 2025
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science
Qiuhai Zeng
Claire Jin
Xinyue Wang
Yuhan Zheng
Qunhua Li
48
0
0
23 Feb 2025
DataSciBench: An LLM Agent Benchmark for Data Science
DataSciBench: An LLM Agent Benchmark for Data Science
Dan Zhang
Sining Zhoubian
Min Cai
Fengzu Li
L. Yang
Wei Wang
Tianjiao Dong
Ziniu Hu
J. Tang
Yisong Yue
ALM
ELM
46
2
0
20 Feb 2025
CoddLLM: Empowering Large Language Models for Data Analytics
CoddLLM: Empowering Large Language Models for Data Analytics
Jiani Zhang
Hengrui Zhang
Rishav Chakravarti
Yiqun Hu
Patrick K. L. Ng
Asterios Katsifodimos
Huzefa Rangwala
George Karypis
Alon Halevy
SyDa
ELM
184
0
0
01 Feb 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
Shing-Chi Cheung
ALM
71
2
0
18 Jan 2025
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Y. Liu
Pengxiang Li
Zishu Wei
C. Xie
Xueyu Hu
Xinchen Xu
Shengyu Zhang
Xiaotian Han
Hongxia Yang
Fei Wu
LLMAG
LRM
53
11
0
08 Jan 2025
MiMoTable: A Multi-scale Spreadsheet Benchmark with Meta Operations for
  Table Reasoning
MiMoTable: A Multi-scale Spreadsheet Benchmark with Meta Operations for Table Reasoning
Zheng Li
Yang Du
Mao Zheng
Mingyang Song
LMTD
80
0
0
16 Dec 2024
AutoGLM: Autonomous Foundation Agents for GUIs
AutoGLM: Autonomous Foundation Agents for GUIs
Xiao Liu
Bo Qin
Dongzhu Liang
Guang Dong
Hanyu Lai
...
Yujia Wang
Yongjun Xu
Zehan Qi
Yuxiao Dong
Jie Tang
LLMAG
64
12
0
28 Oct 2024
DA-Code: Agent Data Science Code Generation Benchmark for Large Language
  Models
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Yiming Huang
Jianwen Luo
Yan Yu
Yitong Zhang
Fangyu Lei
...
Shizhu He
Lifu Huang
Xiao Liu
Jun Zhao
Kang Liu
ELM
ALM
AI4CE
21
6
0
09 Oct 2024
A Survey on Complex Tasks for Goal-Directed Interactive Agents
A Survey on Complex Tasks for Goal-Directed Interactive Agents
Mareike Hartmann
Alexander Koller
LM&Ro
LLMAG
34
0
0
27 Sep 2024
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Asim Biswal
Liana Patel
Siddarth Jha
Amog Kamsetty
Shu Liu
Joseph E. Gonzalez
Carlos Guestrin
Matei A. Zaharia
LMTD
3DV
34
15
0
27 Aug 2024
Spider2-V: How Far Are Multimodal Agents From Automating Data Science
  and Engineering Workflows?
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Ruisheng Cao
Fangyu Lei
Haoyuan Wu
Jixuan Chen
Yeqiao Fu
...
Qian Liu
Victor Zhong
Lu Chen
Kai Yu
Tao Yu
43
18
0
15 Jul 2024
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
Songyang Zhang
Chuyu Zhang
Yingfan Hu
Haowen Shen
Kuikun Liu
...
Fengzhe Zhou
Wenwei Zhang
Xuming He
Dahua Lin
Kai-xiang Chen
44
1
0
15 Jul 2024
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Gaurav Sahu
Abhay Puri
Juan A. Rodriguez
Alexandre Drouin
Perouz Taslakian
...
Christopher Pal
Nicolas Chapados
I. Laradji
Sai Rajeswar Mudumba
Issam Hadj Laradji
ELM
48
4
0
08 Jul 2024
SpreadsheetBench: Towards Challenging Real World Spreadsheet
  Manipulation
SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation
Zeyao Ma
Bohan Zhang
Jing Zhang
Jifan Yu
Xiaokang Zhang
Xiaohan Zhang
Sijia Luo
Xi Wang
Jie Tang
LMTD
62
5
0
21 Jun 2024
Are Large Language Models Good Statisticians?
Are Large Language Models Good Statisticians?
Yizhang Zhu
Shiyin Du
Boyan Li
Yuyu Luo
Nan Tang
ELM
40
15
0
12 Jun 2024
Adaptive In-conversation Team Building for Language Model Agents
Adaptive In-conversation Team Building for Language Model Agents
Linxin Song
Jiale Liu
Jieyu Zhang
Shaokun Zhang
Ao Luo
Shijian Wang
Qingyun Wu
Chi Wang
LLMAG
71
10
0
29 May 2024
Unveiling Disparities in Web Task Handling Between Human and Web Agent
Unveiling Disparities in Web Task Handling Between Human and Web Agent
Kihoon Son
Jinhyeon Kwon
DeEun Choi
Tae Soo Kim
Young-Ho Kim
Sangdoo Yun
Juho Kim
LLMAG
32
0
0
07 May 2024
MetaCoCo: A New Few-Shot Classification Benchmark with Spurious
  Correlation
MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation
Min Zhang
Haoxuan Li
Fei Wu
Kun Kuang
OODD
24
12
0
30 Apr 2024
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
Xiaokang Zhang
Jing Zhang
Zeyao Ma
Yang Li
Bohan Zhang
...
D. Li
Shu Zhao
Juan-Zi Li
Jie Tang
J. Tang
LMTD
RALM
44
20
0
28 Mar 2024
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive
  Data Analysis Agents
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents
Jinyang Li
Nan Huo
Yan Gao
Jiayi Shi
Yingxiu Zhao
Ge Qu
Yurong Wu
Chenhao Ma
Jian-Guang Lou
Reynold Cheng
LLMAG
34
3
0
08 Mar 2024
DACO: Towards Application-Driven and Comprehensive Data Analysis via
  Code Generation
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
Xueqing Wu
Rui Zheng
Jingzhen Sha
Te-Lin Wu
Hanyu Zhou
Mohan Tang
Kai-Wei Chang
Nanyun Peng
Haoran Huang
55
1
0
04 Mar 2024
Are LLMs Capable of Data-based Statistical and Causal Reasoning?
  Benchmarking Advanced Quantitative Reasoning with Data
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data
Xiao Liu
Zirui Wu
Xueqing Wu
Pan Lu
Kai-Wei Chang
Yansong Feng
ELM
LRM
40
27
0
27 Feb 2024
Large Language Model for Table Processing: A Survey
Large Language Model for Table Processing: A Survey
Weizheng Lu
Jiaming Zhang
Jing Zhang
Yueguo Chen
LMTD
60
24
0
04 Feb 2024
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
267
2,494
0
06 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
253
1,073
0
05 Oct 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,077
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
389
8,495
0
28 Jan 2022
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
208
624
0
20 May 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding
  and Generation
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
201
853
0
09 Feb 2021
1