Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.00132
Cited By
v1
v2
v3 (latest)
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
28 June 2024
Haiyang Shen
Yue Li
Desong Meng
Dongqi Cai
Sheng Qi
Li Zhang
Mengwei Xu
Yudong Han
LLMAG
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents"
31 / 31 papers shown
Title
AI Scientists Fail Without Strong Implementation Capability
Minjun Zhu
Qiujie Xie
Yixuan Weng
Jian Wu
Zhen Lin
Linyi Yang
Yue Zhang
ELM
102
0
0
02 Jun 2025
FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Yuxin Wang
Yiran Guo
Y. Zheng
Zhangyue Yin
Tian Jin
Jie Yang
Jiajun Chen
Yuan Li
Xuanjing Huang
Xipeng Qiu
94
0
0
09 Apr 2025
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Peijie Yu
Yifan Yang
Jiajian Li
Zelong Zhang
Haorui Wang
Xiao Feng
Feng Zhang
LLMAG
225
2
0
03 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
182
4
0
31 Mar 2025
StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Zhicheng Guo
Sijie Cheng
Yuchen Niu
Hao Wang
Sicheng Zhou
Wenbing Huang
Yang Liu
CLL
OffRL
221
0
0
26 Mar 2025
PEToolLLM: Towards Personalized Tool Learning in Large Language Models
Qiancheng Xu
Yunshui Li
Heming Xia
Fan Liu
Min Yang
Wenjie Li
165
0
0
26 Feb 2025
Standard Benchmarks Fail - Auditing LLM Agents in Finance Must Prioritize Risk
Zichen Chen
Jiaao Chen
Jianda Chen
Misha Sra
ELM
169
1
0
21 Feb 2025
Beyond Browsing: API-Based Web Agents
Yueqi Song
Frank F. Xu
Shuyan Zhou
Graham Neubig
172
23
0
21 Oct 2024
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls
Kinjal Basu
Ibrahim Abdelaziz
Kiran Kate
Mayank Agarwal
Maxwell Crouse
...
Sadhana Kumaravel
Saurabh Goyal
Xin Wang
Luis A. Lastras
Pavan Kapanipathi
103
11
0
04 Sep 2024
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jun Xu
Jirong Wen
LLMAG
105
107
0
28 May 2024
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
Cheng Qian
Bingxiang He
Zhuang Zhong
Jia Deng
Yujia Qin
...
Zhong Zhang
Jie Zhou
Yankai Lin
Zhiyuan Liu
Maosong Sun
74
36
0
14 Feb 2024
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DV
RALM
350
1,846
1
18 Dec 2023
KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
Haojie Pan
Zepeng Zhai
Hao Yuan
Yaojia Lv
Ruiji Fu
Ming Liu
Zhongyuan Wang
Bing Qin
LLMAG
RALM
83
12
0
08 Dec 2023
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
Yue Huang
Jiawen Shi
Yuan Li
Chenrui Fan
Siyuan Wu
...
Yixin Liu
Pan Zhou
Yao Wan
Neil Zhenqiang Gong
Lichao Sun
LLMAG
129
96
0
04 Oct 2023
The Rise and Potential of Large Language Model Based Agents: A Survey
Zhiheng Xi
Wenxiang Chen
Xin Guo
Wei He
Yiwen Ding
...
Wenjuan Qin
Yongyan Zheng
Xipeng Qiu
Xuanjing Huan
Tao Gui
LM&MA
LM&Ro
3DV
AI4CE
200
959
0
14 Sep 2023
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAG
AI4CE
LM&Ro
227
1,333
0
22 Aug 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Yujia Qin
Shi Liang
Yining Ye
Kunlun Zhu
Lan Yan
...
Jie Zhou
Mark B. Gerstein
Dahai Li
Zhiyuan Liu
Maosong Sun
CLL
ALM
LLMAG
ELM
LM&MA
236
712
0
31 Jul 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
270
630
0
12 Jul 2023
ToolQA: A Dataset for LLM Question Answering with External Tools
Yuchen Zhuang
Yue Yu
Kuan-Chieh Wang
Haotian Sun
Chao Zhang
ELM
LLMAG
103
252
0
23 Jun 2023
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
Qiaoyu Tang
Ziliang Deng
Hongyu Lin
Xianpei Han
Qiao Liang
Boxi Cao
Le Sun
CLL
SyDa
139
202
0
08 Jun 2023
On the Tool Manipulation Capability of Open-source Large Language Models
Qiantong Xu
Fenglu Hong
Yangqiu Song
Changran Hu
Zheng Chen
Jian Zhang
LLMAG
110
78
0
25 May 2023
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Xizhou Zhu
Yuntao Chen
Hao Tian
Chenxin Tao
Weijie Su
...
Lewei Lu
Xiaogang Wang
Yu Qiao
Zhaoxiang Zhang
Jifeng Dai
LLMAG
LM&Ro
121
240
0
25 May 2023
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang
Yuqi Xie
Yunfan Jiang
Ajay Mandlekar
Chaowei Xiao
Yuke Zhu
Linxi Fan
Anima Anandkumar
LM&Ro
SyDa
191
844
0
25 May 2023
Gorilla: Large Language Model Connected with Massive APIs
Shishir G. Patil
Tianjun Zhang
Xin Wang
Joseph E. Gonzalez
ELM
CLL
ALM
SyDa
196
572
0
24 May 2023
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
Shibo Hao
Tianyang Liu
Zhen Wang
Zhiting Hu
RALM
LLMAG
163
183
0
19 May 2023
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Jinyang Li
Binyuan Hui
Ge Qu
Jiaxi Yang
Binhua Li
...
Guoliang Li
Kevin C. C. Chang
Fei Huang
Reynold Cheng
Yongbin Li
LMTD
186
422
0
04 May 2023
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
Yongliang Shen
Kaitao Song
Xu Tan
Dongsheng Li
Weiming Lu
Yueting Zhuang
MLLM
178
913
0
30 Mar 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDa
RALM
231
1,781
0
09 Feb 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
525
3,007
0
06 Oct 2022
From Robotic Process Automation to Intelligent Process Automation: Emerging Trends
Tathagata Chakraborti
Vatche Isahagian
Rania Y. Khalaf
Y. Khazaeni
Vinod Muthusamy
Sadhana Kumaravel
Merve Unuvar
AI4CE
56
46
0
27 Jul 2020
IFTTT vs. Zapier: A Comparative Study of Trigger-Action Programming Frameworks
Amir Rahmati
Earlence Fernandes
Jaeyeon Jung
A. Prakash
43
35
0
08 Sep 2017
1