ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08244
  4. Cited By
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

14 April 2023
Minghao Li
Yingxiu Zhao
Yu Bowen
Feifan Song
Hangyu Li
Haiyang Yu
Zhoujun Li
Fei Huang
Yongbin Li
    ELM
    RALM
    CLL
ArXivPDFHTML

Papers citing "API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs"

50 / 113 papers shown
Title
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval
Yanfei Chen
Jinsung Yoon
Devendra Singh Sachan
Qingze Wang
Vincent Cohen-Addad
M. Bateni
Chen-Yu Lee
Tomas Pfister
34
5
0
03 Aug 2024
MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented
  Generation for Pharmacovigilance
MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance
Jihye Choi
Nils Palumbo
P. Chalasani
Matthew M. Engelhard
Somesh Jha
Anivarya Kumar
David Page
44
4
0
03 Aug 2024
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Mengkang Hu
Yixiao Wang
Can Xu
Lingfeng Sun
Chensheng Peng
T. Hannagan
Nicola Poerio
Saravan Rajmohan
LM&Ro
LLMAG
72
16
0
01 Aug 2024
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Zehui Chen
Kuikun Liu
Qiuchen Wang
Jiangning Liu
Wenwei Zhang
Kai Chen
Feng Zhao
LLMAG
78
21
0
29 Jul 2024
Speech-Copilot: Leveraging Large Language Models for Speech Processing
  via Task Decomposition, Modularization, and Program Generation
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
Chun-Yi Kuan
Chih-Kai Yang
Wei-Ping Huang
Ke-Han Lu
Hung-yi Lee
55
7
0
13 Jul 2024
What Affects the Stability of Tool Learning? An Empirical Study on the
  Robustness of Tool Learning Frameworks
What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks
Chengrui Huang
Zhengliang Shi
Yuntao Wen
Xiuying Chen
Peng Han
Shen Gao
Shuo Shang
47
1
0
03 Jul 2024
Granite-Function Calling Model: Introducing Function Calling Abilities
  via Multi-task Learning of Granular Tasks
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Ibrahim Abdelaziz
Kinjal Basu
Mayank Agarwal
Yara Rizk
Matthew Stallone
...
Merve Unuvar
David D. Cox
Salim Roukos
Luis A. Lastras
Pavan Kapanipathi
LLMAG
36
21
0
27 Jun 2024
APIGen: Automated Pipeline for Generating Verifiable and Diverse
  Function-Calling Datasets
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
Zuxin Liu
Thai Hoang
Jianguo Zhang
Ming Zhu
Tian Lan
...
Silvio Savarese
Juan Carlos Niebles
Huan Wang
Shelby Heinecke
Caiming Xiong
55
46
0
26 Jun 2024
Enhancing Tool Retrieval with Iterative Feedback from Large Language
  Models
Enhancing Tool Retrieval with Iterative Feedback from Large Language Models
Qiancheng Xu
Yongqi Li
Heming Xia
Wenjie Li
KELM
42
4
0
25 Jun 2024
Geneverse: A collection of Open-source Multimodal Large Language Models
  for Genomic and Proteomic Research
Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research
Tianyu Liu
Yijia Xiao
Xiao Luo
Hua Xu
W. Zheng
Hongyu Zhao
42
3
0
21 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Shu Wang
Xing Xie
Xing Xie
ALM
ELM
56
5
0
20 Jun 2024
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Tanmay Gupta
Luca Weihs
Aniruddha Kembhavi
LLMAG
ELM
61
1
0
18 Jun 2024
Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?
Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?
Seungbin Yang
chaeHun Park
Taehee Kim
Jaegul Choo
46
2
0
18 Jun 2024
R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval
  Augmented Large Language Models
R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models
Shangqing Tu
Yuanchun Wang
Jifan Yu
Yuyang Xie
Yaran Shi
Xiaozhi Wang
Jing Zhang
Lei Hou
Juanzi Li
ELM
51
3
0
17 Jun 2024
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Joongwon Kim
Bhargavi Paranjape
Tushar Khot
Hannaneh Hajishirzi
LM&Ro
ELM
LLMAG
LRM
46
9
0
10 Jun 2024
Open Grounded Planning: Challenges and Benchmark Construction
Open Grounded Planning: Challenges and Benchmark Construction
Shiguang Guo
Ziliang Deng
Hongyu Lin
Yaojie Lu
Xianpei Han
Le Sun
LRM
LM&Ro
LLMAG
36
1
0
05 Jun 2024
A Survey of Useful LLM Evaluation
A Survey of Useful LLM Evaluation
Ji-Lun Peng
Sijia Cheng
Egil Diau
Yung-Yu Shih
Po-Heng Chen
Yen-Ting Lin
Yun-Nung Chen
LLMAG
ELM
39
12
0
03 Jun 2024
Tool Learning with Large Language Models: A Survey
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
LLMAG
36
87
0
28 May 2024
Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents
Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents
Zhengliang Shi
Shen Gao
Xiuyi Chen
Yue Feng
Lingyong Yan
Haibo Shi
Dawei Yin
Zhumin Chen
Suzan Verberne
LLMAG
47
15
0
26 May 2024
Towards Completeness-Oriented Tool Retrieval for Large Language Models
Towards Completeness-Oriented Tool Retrieval for Large Language Models
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
KELM
33
7
0
25 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
72
44
0
23 May 2024
Language Models can Evaluate Themselves via Probability Discrepancy
Language Models can Evaluate Themselves via Probability Discrepancy
Tingyu Xia
Bowen Yu
Yuan Wu
Yi-Ju Chang
Chang Zhou
ELM
37
4
0
17 May 2024
Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and
  Detailed Benchmark
Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark
Mengsong Wu
Tong Zhu
Han Han
Chuanyuan Tan
Xiang Zhang
Wenliang Chen
31
17
0
14 May 2024
CACTUS: Chemistry Agent Connecting Tool-Usage to Science
CACTUS: Chemistry Agent Connecting Tool-Usage to Science
Andrew D. McNaughton
Gautham Ramalaxmi
Agustin Kruel
C. Knutson
R. Varikoti
Neeraj Kumar
58
7
0
02 May 2024
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace
  Setting
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Olly Styles
Sam Miller
Patricio Cerda-Mardini
T. Guha
Victor Sanchez
Bertie Vidgen
LLMAG
41
3
0
01 May 2024
From Persona to Personalization: A Survey on Role-Playing Language
  Agents
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
64
79
0
28 Apr 2024
Attacks on Third-Party APIs of Large Language Models
Attacks on Third-Party APIs of Large Language Models
Wanru Zhao
Vidit Khazanchi
Haodi Xing
Xuanli He
Qiongkai Xu
Nicholas D. Lane
31
6
0
24 Apr 2024
Octopus: On-device language model for function calling of software APIs
Octopus: On-device language model for function calling of software APIs
Wei Chen
Zhiyuan Li
Mingyuan Ma
LLMAG
27
14
0
02 Apr 2024
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation
  Benchmark for Chinese Large Language Models
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Deyi Xiong
ELM
43
0
0
19 Mar 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and
  Safety
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Chuang Liu
Linhao Yu
Jiaxuan Li
Renren Jin
Yufei Huang
...
Tao Liu
Jinwang Song
Hongying Zan
Sun Li
Deyi Xiong
ELM
40
7
0
18 Mar 2024
What Are Tools Anyway? A Survey from the Language Model Perspective
What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang
Zhoujun Cheng
Hao Zhu
Daniel Fried
Graham Neubig
68
27
0
18 Mar 2024
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Zhicheng Guo
Sijie Cheng
Hao Wang
Shihao Liang
Yujia Qin
Peng Li
Zhiyuan Liu
Maosong Sun
Yang Liu
ELM
52
24
0
12 Mar 2024
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive
  Data Analysis Agents
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents
Jinyang Li
Nan Huo
Yan Gao
Jiayi Shi
Yingxiu Zhao
Ge Qu
Yurong Wu
Chenhao Ma
Jian-Guang Lou
Reynold Cheng
LLMAG
37
3
0
08 Mar 2024
LLMArena: Assessing Capabilities of Large Language Models in Dynamic
  Multi-Agent Environments
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments
Junzhe Chen
Xuming Hu
Shuodi Liu
Shiyu Huang
Weijuan Tu
Zhaofeng He
Lijie Wen
ELM
LLMAG
48
10
0
26 Feb 2024
AttributionBench: How Hard is Automatic Attribution Evaluation?
AttributionBench: How Hard is Automatic Attribution Evaluation?
Yifei Li
Xiang Yue
Zeyi Liao
Huan Sun
HILM
35
13
0
23 Feb 2024
Large Language Models as Zero-shot Dialogue State Tracker through
  Function Calling
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Zekun Li
Zhiyu Zoey Chen
Mike Ross
Patrick Huber
Seungwhan Moon
Zhaojiang Lin
Xin Luna Dong
Adithya Sagar
Xifeng Yan
Paul A. Crook
43
22
0
16 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language
  Agents
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
51
15
0
15 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM
  Agents Exponentially Fast
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min Lin
LLMAG
LM&Ro
37
50
0
13 Feb 2024
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Yu Du
Fangyun Wei
Hongyang R. Zhang
LLMAG
40
38
0
06 Feb 2024
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Jian Xie
Kai Zhang
Jiangjie Chen
Tinghui Zhu
Renze Lou
Yuandong Tian
Yanghua Xiao
Yu-Chuan Su
LLMAG
LM&Ro
62
136
0
02 Feb 2024
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool
  Utilization in Real-World Complex Scenarios
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
Shijue Huang
Wanjun Zhong
Jianqiao Lu
Qi Zhu
Jiahui Gao
...
Yasheng Wang
Lifeng Shang
Xin Jiang
Ruifeng Xu
Qun Liu
LLMAG
30
33
0
30 Jan 2024
RE-GAINS & EnChAnT: Intelligent Tool Manipulation Systems For Enhanced
  Query Responses
RE-GAINS & EnChAnT: Intelligent Tool Manipulation Systems For Enhanced Query Responses
Sahil Girhepuje
Siva Sankar Sajeev
Purvam Jain
Arya Sikder
Adithya Rama Varma
Ryan George
Akshay Govind Srinivasan
Mahendra Kurup
Ashmit Sinha
Sudip Mondal
RALM
37
0
0
28 Jan 2024
Large Language Model based Multi-Agents: A Survey of Progress and
  Challenges
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Taicheng Guo
Preslav Nakov
Yaqi Wang
Ruidi Chang
Shichao Pei
Nitesh Chawla
Olaf Wiest
Xiangliang Zhang
LLMAG
LM&Ro
AI4CE
LRM
47
252
0
21 Jan 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
...
Binglin Zhou
Fangqi Li
ZhuoSheng Zhang
Rui Wang
Gongshen Liu
ELM
38
62
0
18 Jan 2024
EHRAgent: Code Empowers Large Language Models for Few-shot Complex
  Tabular Reasoning on Electronic Health Records
EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records
Wenqi Shi
Ran Xu
Yuchen Zhuang
Yue Yu
Jieyu Zhang
Hang Wu
Yuanda Zhu
Joyce C. Ho
Carl Yang
M. D. Wang
32
27
0
13 Jan 2024
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
Siyu Yuan
Kaitao Song
Jiangjie Chen
Xu Tan
Yongliang Shen
Ren Kan
Dongsheng Li
Deqing Yang
LLMAG
31
54
0
11 Jan 2024
Open-TI: Open Traffic Intelligence with Augmented Language Model
Open-TI: Open Traffic Intelligence with Augmented Language Model
Longchao Da
Kuanru Liou
Tiejin Chen
Xuesong Zhou
Xiangyong Luo
Yezhou Yang
Hua Wei
49
22
0
30 Dec 2023
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models
  with MultiAPI Benchmark
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark
Xiao Liu
Jianfeng Lin
Jiawei Zhang
38
2
0
21 Nov 2023
Igniting Language Intelligence: The Hitchhiker's Guide From
  Chain-of-Thought Reasoning to Language Agents
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
ZhuoSheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAG
LM&Ro
LRM
44
53
0
20 Nov 2023
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
Jiao Ou
Junda Lu
Che Liu
Yihong Tang
Fuzheng Zhang
Di Zhang
Kun Gai
ALM
LM&MA
34
14
0
03 Nov 2023
Previous
123
Next