ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.00132
  4. Cited By
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

28 June 2024
Haiyang Shen
Yue Li
Desong Meng
Dongqi Cai
Sheng Qi
Li Zhang
Mengwei Xu
Yun Ma
    LLMAG
ArXivPDFHTML

Papers citing "ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents"

10 / 10 papers shown
Title
FamilyTool: A Multi-hop Personalized Tool Use Benchmark
Yuxin Wang
Yiran Guo
Y. Zheng
Zhangyue Yin
Tian Jin
Jie Yang
Jiajun Chen
Xuanjing Huang
Xipeng Qiu
24
0
0
09 Apr 2025
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Peijie Yu
Yifan Yang
Jiyang Li
Zelong Zhang
Haorui Wang
Xiao Feng
Feng Zhang
LLMAG
117
0
0
03 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
64
1
0
31 Mar 2025
StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Zhicheng Guo
Sijie Cheng
Yuchen Niu
Hao Wang
Sicheng Zhou
Wenbing Huang
Yang Liu
CLL
OffRL
88
0
0
26 Mar 2025
PEToolLLM: Towards Personalized Tool Learning in Large Language Models
Qiancheng Xu
Yunshui Li
Heming Xia
Fan Liu
Min Yang
Wenjie Li
72
0
0
26 Feb 2025
Position: Standard Benchmarks Fail -- LLM Agents Present Overlooked Risks for Financial Applications
Position: Standard Benchmarks Fail -- LLM Agents Present Overlooked Risks for Financial Applications
Zichen Chen
Jiaao Chen
Jianda Chen
Misha Sra
ELM
38
1
0
21 Feb 2025
Beyond Browsing: API-Based Web Agents
Beyond Browsing: API-Based Web Agents
Yueqi Song
Frank F. Xu
Shuyan Zhou
Graham Neubig
61
16
0
21 Oct 2024
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls
Kinjal Basu
Ibrahim Abdelaziz
Kiran Kate
Mayank Agarwal
Maxwell Crouse
...
Sadhana Kumaravel
Saurabh Goyal
Luis Lastras
Luis A. Lastras
Pavan Kapanipathi
33
7
0
04 Sep 2024
Tool Learning with Large Language Models: A Survey
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
LLMAG
34
83
0
28 May 2024
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik R. Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
273
2,549
0
06 Oct 2022
1