ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.06423
  4. Cited By
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

8 July 2024
Gaurav Sahu
Abhay Puri
Juan A. Rodriguez
Alexandre Drouin
Perouz Taslakian
Valentina Zantedeschi
Perouz Taslakian
David Vazquez
Nicolas Chapados
Christopher Pal
Nicolas Chapados
I. Laradji
Sai Rajeswar Mudumba
Issam Hadj Laradji
    ELM
ArXivPDFHTML

Papers citing "InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation"

30 / 30 papers shown
Title
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics
Ran Zhang
Mohannad Elhamod
32
0
0
29 May 2025
R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution
R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution
Xu Yang
Xiao Yang
Shikai Fang
Bowen Xian
Yuante Li
...
Xinpeng Hong
Weiqing Liu
Yelong Shen
Weizhu Chen
Jiang Bian
26
0
0
20 May 2025
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
Amirhossein Abaskohi
A. Ramesh
Shailesh Nanisetty
Chirag Goel
David Vazquez
Christopher Pal
Spandana Gella
Giuseppe Carenini
I. Laradji
72
0
0
10 Apr 2025
MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based on Large language Model
Chengze Zhang
Changshan Li
Shiyang Gao
63
0
0
03 Jan 2025
Luna: An Evaluation Foundation Model to Catch Language Model
  Hallucinations with High Accuracy and Low Cost
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
Masha Belyi
Robert Friel
Shuai Shao
Atindriyo Sanyal
HILM
RALM
79
7
0
03 Jun 2024
From Words to Numbers: Your Large Language Model Is Secretly A Capable
  Regressor When Given In-Context Examples
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu
Vlad-Andrei Negru
Vasile Suciu
Mihai Surdeanu
50
33
0
11 Apr 2024
InsightLens: Discovering and Exploring Insights from Conversational
  Contexts in Large-Language-Model-Powered Data Analysis
InsightLens: Discovering and Exploring Insights from Conversational Contexts in Large-Language-Model-Powered Data Analysis
Luoxuan Weng
Xingbo Wang
Junyu Lu
Yingchaojie Feng
Yihan Liu
Wei Chen
73
5
0
02 Apr 2024
Benchmarking Data Science Agents
Benchmarking Data Science Agents
Yuge Zhang
Qiyang Jiang
Xingyu Han
Nan Chen
Yuqing Yang
Kan Ren
ELM
43
12
0
27 Feb 2024
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
Xueyu Hu
Ziyu Zhao
Shuang Wei
Ziwei Chai
Qianli Ma
...
Jiwei Li
Kun Kuang
Yang Yang
Hongxia Yang
Leilei Gan
LMTD
ELM
36
51
0
10 Jan 2024
Capture the Flag: Uncovering Data Insights with Large Language Models
Capture the Flag: Uncovering Data Insights with Large Language Models
I. Laradji
Perouz Taslakian
Sai Rajeswar
Valentina Zantedeschi
Alexandre Lacoste
Nicolas Chapados
David Vazquez
Christopher Pal
Alexandre Drouin
83
3
0
21 Dec 2023
OpenAgents: An Open Platform for Language Agents in the Wild
OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie
Fan Zhou
Zhoujun Cheng
Peng Shi
Luoxuan Weng
...
Yiheng Xu
Hongjin Su
Dongchan Shin
Caiming Xiong
Tao Yu
LLMAG
72
97
0
16 Oct 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
256
11,765
0
18 Jul 2023
Data-Copilot: Bridging Billions of Data and Humans with Autonomous
  Workflow
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
Wenqi Zhang
Yongliang Shen
Weiming Lu
Yueting Zhuang
LLMAG
54
54
0
12 Jun 2023
Beyond Generating Code: Evaluating GPT on a Data Visualization Course
Beyond Generating Code: Evaluating GPT on a Data Visualization Course
Zhutian Chen
Chenyang Zhang
Qianwen Wang
J. Troidl
Simon Warchol
Johanna Beyer
Nils Gehlenborg
Hanspeter Pfister
115
30
0
05 Jun 2023
Is GPT-4 a Good Data Analyst?
Is GPT-4 a Good Data Analyst?
Liying Cheng
Xingxuan Li
Lidong Bing
LM&MA
ELM
74
99
0
24 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
108
590
0
22 May 2023
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning
  by Large Language Models
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Lei Wang
Wanyu Xu
Yihuai Lan
Zhiqiang Hu
Yunshi Lan
Roy Ka-wei Lee
Ee-Peng Lim
ReLM
LRM
97
344
0
06 May 2023
MLCopilot: Unleashing the Power of Large Language Models in Solving
  Machine Learning Tasks
MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
Lei Zhang
Yuge Zhang
Kan Ren
Dongsheng Li
Yuqing Yang
LLMAG
66
39
0
28 Apr 2023
Demonstration of InsightPilot: An LLM-Empowered Automated Data
  Exploration System
Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System
Pingchuan Ma
Rui Ding
Shuai Wang
Shi Han
Dongmei Zhang
32
20
0
02 Apr 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
153
1,174
0
29 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
973
14,179
0
15 Mar 2023
Natural Language to Code Generation in Interactive Data Science
  Notebooks
Natural Language to Code Generation in Interactive Data Science Notebooks
Pengcheng Yin
Wen-Ding Li
Kefan Xiao
Abhishek Rao
Yeming Wen
...
Paige Bailey
Michele Catasta
Henryk Michalewski
Oleksandr Polozov
Charles Sutton
48
61
0
19 Dec 2022
DS-1000: A Natural and Reliable Benchmark for Data Science Code
  Generation
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
Yuhang Lai
Chengxi Li
Yiming Wang
Tianyi Zhang
Ruiqi Zhong
Luke Zettlemoyer
Scott Yih
Daniel Fried
Si-yi Wang
Tao Yu
ELM
ALM
74
329
0
18 Nov 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
380
2,806
0
06 Oct 2022
CERT: Continual Pre-Training on Sketches for Library-Oriented Code
  Generation
CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation
Daoguang Zan
Bei Chen
Dejian Yang
Zeqi Lin
Minsu Kim
Bei Guan
Yongji Wang
Weizhu Chen
Jian-Guang Lou
56
120
0
14 Jun 2022
Training and Evaluating a Jupyter Notebook Data Science Assistant
Training and Evaluating a Jupyter Notebook Data Science Assistant
Shubham Chandel
Colin B. Clement
Guillermo Serrato
Neel Sundaresan
64
44
0
30 Jan 2022
FinQA: A Dataset of Numerical Reasoning over Financial Data
FinQA: A Dataset of Numerical Reasoning over Financial Data
Zhiyu Chen
Wenhu Chen
Charese Smiley
Sameena Shah
Iana Borova
...
Reema N Moussa
Matthew I. Beane
Ting-Hao 'Kenneth' Huang
Bryan R. Routledge
Wenjie Wang
AIMat
103
334
0
01 Sep 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
187
5,440
0
07 Jul 2021
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
261
5,764
0
21 Apr 2019
Seq2SQL: Generating Structured Queries from Natural Language using
  Reinforcement Learning
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
Victor Zhong
Caiming Xiong
R. Socher
RALM
85
1,193
0
31 Aug 2017
1