ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.18760
  4. Cited By
TaskBench: Benchmarking Large Language Models for Task Automation

TaskBench: Benchmarking Large Language Models for Task Automation

30 November 2023
Yongliang Shen
Kaitao Song
Xu Tan
Wenqi Zhang
Kan Ren
Siyu Yuan
Weiming Lu
Dongsheng Li
Yueting Zhuang
ArXivPDFHTML

Papers citing "TaskBench: Benchmarking Large Language Models for Task Automation"

39 / 39 papers shown
Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
2
0
26 Apr 2025
FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Yuxin Wang
Yiran Guo
Y. Zheng
Zhangyue Yin
Tian Jin
Jie Yang
Jiajun Chen
Xuanjing Huang
Xipeng Qiu
24
0
0
09 Apr 2025
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Peijie Yu
Yifan Yang
Jiajian Li
Zelong Zhang
Haorui Wang
Xiao Feng
Feng Zhang
LLMAG
117
0
0
03 Apr 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Wenbo Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&Ro
LRM
75
3
0
27 Mar 2025
PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play
PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play
Wei Fang
Yang Zhang
Kaizhi Qian
James R. Glass
Yada Zhu
LLMAG
73
0
0
18 Mar 2025
DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL
Haoyuan Ma
Yongliang Shen
Hengwei Liu
Wenqi Zhang
Haolei Xu
Qiuying Peng
Jun Wang
Weiming Lu
49
1
0
06 Mar 2025
From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support Systems
Zekun Zhou
Xiaocheng Feng
L. Huang
Xiachong Feng
Ziyun Song
...
Baoxin Wang
Dayong Wu
Guoping Hu
Ting Liu
Bing Qin
AI4TS
77
1
0
03 Mar 2025
PEToolLLM: Towards Personalized Tool Learning in Large Language Models
Qiancheng Xu
Yunshui Li
Heming Xia
Fan Liu
Min Yang
Wenjie Li
72
0
0
26 Feb 2025
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems
Longling Geng
Edward Y. Chang
LLMAG
79
1
0
26 Feb 2025
USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner
USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner
Mingcong Chen
Siqi Fan
Guanglin Cao
Hongbin Liu
55
0
0
18 Feb 2025
AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement
AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement
Shivam Singh
Karthik Swaminathan
Nabanita Dash
Ramandeep Singh
Snehasis Banerjee
Mohan Sridharan
Madhava Krishna
LLMAG
LM&Ro
108
0
0
04 Feb 2025
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
Han Han
Tong Zhu
Xiang Zhang
Mengsong Wu
Hao Xiong
Wenliang Chen
35
0
0
08 Jan 2025
DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing
  Pipeline
DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline
Wenhao Sun
Sai Hou
Zehao Wang
Bo Yu
Shaoshan Liu
Xu Yang
Shuai Liang
Yiming Gan
Yinhe Han
LLMAG
121
2
0
02 Dec 2024
Action Engine: An LLM-based Framework for Automatic FaaS Workflow
  Generation
Action Engine: An LLM-based Framework for Automatic FaaS Workflow Generation
Akiharu Esashi
Pawissanutt Lertpongrujikorn
M. Salehi
79
0
0
29 Nov 2024
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
Teng Wang
Wing-Yin Yu
Zhenqi He
Zehua Liu
Xiongwei Han
...
Han Wu
Wei Shi
Ruifeng She
Fangzhou Zhu
Tao Zhong
AIMat
OffRL
LRM
82
3
0
26 Nov 2024
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu
Yufei Guo
Yuan Meng
Yanning Zhang
Le Sun
Zhi Wang
210
0
0
25 Nov 2024
Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration
  and Evaluation using Novel Metrics and Dataset
Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset
Adrian Garret Gabriel
Alaa Alameer Ahmad
Shankar Kumar Jeyakumar
LLMAG
31
1
0
29 Oct 2024
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Mingyang Chen
Haoze Sun
Tianpeng Li
Fan Yang
Hao Liang
Keer Lu
Bin Cui
Wentao Zhang
Zenan Zhou
Weipeng Chen
LRM
52
5
0
16 Oct 2024
Skill Learning Using Process Mining for Large Language Model Plan
  Generation
Skill Learning Using Process Mining for Large Language Model Plan Generation
Andrei Cosmin Redis
M. Sani
Bahram Zarrin
Andrea Burattin
34
0
0
14 Oct 2024
A Survey on Complex Tasks for Goal-Directed Interactive Agents
A Survey on Complex Tasks for Goal-Directed Interactive Agents
Mareike Hartmann
Alexander Koller
LM&Ro
LLMAG
34
0
0
27 Sep 2024
LLM With Tools: A Survey
LLM With Tools: A Survey
Zhuocheng Shen
43
8
0
24 Sep 2024
ProcessTBench: An LLM Plan Generation Dataset for Process Mining
ProcessTBench: An LLM Plan Generation Dataset for Process Mining
Andrei Cosmin Redis
M. Sani
Bahram Zarrin
Andrea Burattin
20
1
0
13 Sep 2024
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research
  Repositories
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
Ben Bogin
Kejuan Yang
Shashank Gupta
Kyle Richardson
Erin Bransom
Peter Clark
Ashish Sabharwal
Tushar Khot
ELM
LRM
47
10
0
11 Sep 2024
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Zehui Chen
Kuikun Liu
Qiuchen Wang
Jiangning Liu
Wenwei Zhang
Kai Chen
Feng Zhao
LLMAG
78
20
0
29 Jul 2024
AI Agents That Matter
AI Agents That Matter
Sayash Kapoor
Benedikt Stroebl
Zachary S. Siegel
Nitya Nadgir
Arvind Narayanan
49
37
0
01 Jul 2024
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Siyu Yuan
Kaitao Song
Jiangjie Chen
Xu Tan
Dongsheng Li
Deqing Yang
LLMAG
66
14
0
20 Jun 2024
What is the best model? Application-driven Evaluation for Large Language
  Models
What is the best model? Application-driven Evaluation for Large Language Models
Shiguo Lian
Kaikai Zhao
Xinhui Liu
Xuejiao Lei
Bikun Yang
Wenjing Zhang
Kai Wang
Zhaoxiang Liu
ALM
ELM
43
2
0
14 Jun 2024
Tool Learning with Large Language Models: A Survey
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
LLMAG
34
83
0
28 May 2024
From Persona to Personalization: A Survey on Role-Playing Language
  Agents
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
62
79
0
28 Apr 2024
Adapting LLMs for Efficient Context Processing through Soft Prompt
  Compression
Adapting LLMs for Efficient Context Processing through Soft Prompt Compression
Cangqing Wang
Yutian Yang
Ruisi Li
Dan Sun
Ruicong Cai
Yuzhu Zhang
Chengqian Fu
Lillian Floyd
29
16
0
07 Apr 2024
What Are Tools Anyway? A Survey from the Language Model Perspective
What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang
Zhoujun Cheng
Hao Zhu
Daniel Fried
Graham Neubig
68
27
0
18 Mar 2024
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma
Weikai Huang
Jieyu Zhang
Tanmay Gupta
Ranjay Krishna
55
18
0
17 Mar 2024
Agent-Pro: Learning to Evolve via Policy-Level Reflection and
  Optimization
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
Wenqi Zhang
Ke Tang
Hai Wu
Mengna Wang
Yongliang Shen
Guiyang Hou
Zeqi Tan
Peng Li
Yueting Zhuang
Weiming Lu
LLMAG
44
37
0
27 Feb 2024
AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential
  Reasoning Ability
AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability
Siwei Yang
Bingchen Zhao
Cihang Xie
LRM
17
6
0
14 Feb 2024
TimeArena: Shaping Efficient Multitasking Language Agents in a
  Time-Aware Simulation
TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation
Yikai Zhang
Siyu Yuan
Caiyu Hu
Kyle Richardson
Yanghua Xiao
Jiangjie Chen
AI4CE
LLMAG
32
13
0
08 Feb 2024
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
Siyu Yuan
Kaitao Song
Jiangjie Chen
Xu Tan
Yongliang Shen
Ren Kan
Dongsheng Li
Deqing Yang
LLMAG
28
54
0
11 Jan 2024
ProTIP: Progressive Tool Retrieval Improves Planning
ProTIP: Progressive Tool Retrieval Improves Planning
R. Anantha
Bortik Bandyopadhyay
Anirudh Kashi
Sayantan Mahinder
Andrew W Hill
Srinivas Chappidi
24
6
0
16 Dec 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1