Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.16504
Cited By
On the Tool Manipulation Capability of Open-source Large Language Models
25 May 2023
Qiantong Xu
Fenglu Hong
Yangqiu Song
Changran Hu
Zheng Chen
Jian Zhang
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Tool Manipulation Capability of Open-source Large Language Models"
50 / 68 papers shown
Title
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRL
VLM
41
0
0
06 May 2025
Prompt Injection Attack to Tool Selection in LLM Agents
Jiawen Shi
Zenghui Yuan
Guiyao Tie
Pan Zhou
Neil Zhenqiang Gong
Lichao Sun
LLMAG
51
0
0
28 Apr 2025
Bridging Language Models and Financial Analysis
Alejandro Lopez-Lira
Jihoon Kwon
Sangwoon Yoon
Jy-yong Sohn
Chanyeol Choi
AIFin
41
0
0
14 Mar 2025
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng
Zhiyuan Huang
Zhenghai Xue
Xinrun Wang
Bo An
Shuicheng Yan
82
14
0
17 Feb 2025
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie
Qingqiu Li
Zhuohao Yu
Yuejie Zhang
Yue Zhang
Linyi Yang
ELM
63
1
0
15 Feb 2025
AgentRec: Agent Recommendation Using Sentence Embeddings Aligned to Human Feedback
Joshua Park
Yongfeng Zhang
LLMAG
LM&Ro
95
1
0
23 Jan 2025
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Hiroki Furuta
Yutaka Matsuo
Aleksandra Faust
Izzeddin Gur
CLL
92
14
0
03 Jan 2025
PTR: Precision-Driven Tool Recommendation for Large Language Models
Hang Gao
Yongfeng Zhang
KELM
46
0
0
14 Nov 2024
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Pei Wang
Yanan Wu
Zekun Wang
Jiaheng Liu
Xiaoshuai Song
...
Ge Zhang
Hangyu Guo
Zhaoxiang Zhang
Wenbo Su
Bo Zheng
ELM
39
2
0
15 Oct 2024
JurEE not Judges: safeguarding llm interactions with small, specialised Encoder Ensembles
Dom Nasrabadi
31
1
0
11 Oct 2024
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Ji-Rong Wen
58
9
0
10 Oct 2024
ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities
Zhenchao Jin
Mengchen Liu
Dongdong Chen
Lingting Zhu
Yunsheng Li
Lequan Yu
KELM
31
0
0
08 Oct 2024
LLM With Tools: A Survey
Zhuocheng Shen
43
9
0
24 Sep 2024
SEAL: Suite for Evaluating API-use of LLMs
Woojeong Kim
Ashish Jagmohan
Aditya Vempaty
ELM
ALM
LLMAG
35
0
0
23 Sep 2024
ProcessTBench: An LLM Plan Generation Dataset for Process Mining
Andrei Cosmin Redis
M. Sani
Bahram Zarrin
Andrea Burattin
15
1
0
13 Sep 2024
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls
Kinjal Basu
Ibrahim Abdelaziz
Kelsey Bradford
M. Crouse
Kiran Kate
...
Yara Rizk
Xin Wang
Luis A. Lastras
Pavan Kapanipathi
Pavan Kapanipathi
33
7
0
04 Sep 2024
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Ruisheng Cao
Fangyu Lei
Haoyuan Wu
Jixuan Chen
Yeqiao Fu
...
Qian Liu
Victor Zhong
Lu Chen
Kai Yu
Tao Yu
35
18
0
15 Jul 2024
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
Haiyang Shen
Yue Li
Desong Meng
Dongqi Cai
Sheng Qi
Li Zhang
Mengwei Xu
Yun Ma
LLMAG
46
9
0
28 Jun 2024
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Ibrahim Abdelaziz
Kinjal Basu
Mayank Agarwal
Sadhana Kumaravel
Matthew Stallone
...
Merve Unuvar
David D. Cox
Salim Roukos
Luis A. Lastras
Pavan Kapanipathi
LLMAG
34
20
0
27 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Shu Wang
Xing Xie
Xing Xie
ALM
ELM
50
5
0
20 Jun 2024
Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?
Seungbin Yang
chaeHun Park
Taehee Kim
Jaegul Choo
46
2
0
18 Jun 2024
τ
τ
τ
-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
Shunyu Yao
Noah Shinn
P. Razavi
Karthik Narasimhan
ALM
41
55
0
17 Jun 2024
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Fangzhi Xu
Qiushi Sun
Kanzhi Cheng
Xiaozhong Liu
Yu Qiao
Zhiyong Wu
LLMAG
38
5
0
17 Jun 2024
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
LLMAG
31
80
0
28 May 2024
Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents
Zhengliang Shi
Shen Gao
Xiuyi Chen
Yue Feng
Lingyong Yan
Haibo Shi
Dawei Yin
Zhumin Chen
Suzan Verberne
LLMAG
47
15
0
26 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
69
44
0
23 May 2024
Value Augmented Sampling for Language Model Alignment and Personalization
Seungwook Han
Idan Shenfeld
Akash Srivastava
Yoon Kim
Pulkit Agrawal
OffRL
36
23
0
10 May 2024
CACTUS: Chemistry Agent Connecting Tool-Usage to Science
Andrew D. McNaughton
Gautham Ramalaxmi
Agustin Kruel
C. Knutson
R. Varikoti
Neeraj Kumar
50
7
0
02 May 2024
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Olly Styles
Sam Miller
Patricio Cerda-Mardini
T. Guha
Victor Sanchez
Bertie Vidgen
LLMAG
33
3
0
01 May 2024
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
62
77
0
28 Apr 2024
What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang
Zhoujun Cheng
Hao Zhu
Daniel Fried
Graham Neubig
65
27
0
18 Mar 2024
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
ALM
ELM
64
50
0
15 Feb 2024
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
Zhen Guo
Adriana Meza Soria
Wei Sun
Yikang Shen
Rameswar Panda
ELM
ALM
55
1
0
14 Feb 2024
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Yu Du
Fangyun Wei
Hongyang R. Zhang
LLMAG
32
37
0
06 Feb 2024
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Jian Xie
Kai Zhang
Jiangjie Chen
Tinghui Zhu
Renze Lou
Yuandong Tian
Yanghua Xiao
Yu-Chuan Su
LLMAG
LM&Ro
62
129
0
02 Feb 2024
Executable Code Actions Elicit Better LLM Agents
Xingyao Wang
Yangyi Chen
Lifan Yuan
Yizhe Zhang
Yunzhu Li
Hao Peng
Heng Ji
ELM
LLMAG
LM&Ro
37
131
0
01 Feb 2024
RE-GAINS & EnChAnT: Intelligent Tool Manipulation Systems For Enhanced Query Responses
Sahil Girhepuje
Siva Sankar Sajeev
Purvam Jain
Arya Sikder
Adithya Rama Varma
Ryan George
Akshay Govind Srinivasan
Mahendra Kurup
Ashmit Sinha
Sudip Mondal
RALM
31
0
0
28 Jan 2024
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Chang Ma
Junlei Zhang
Zhihao Zhu
Cheng Yang
Yujiu Yang
Yaohui Jin
Zhenzhong Lan
Lingpeng Kong
Junxian He
ELM
LLMAG
37
54
0
24 Jan 2024
Assessing and Understanding Creativity in Large Language Models
Yunpu Zhao
Rui Zhang
Wenyi Li
Di Huang
Jiaming Guo
...
Xingui Hu
Zidong Du
Qi Guo
Ling Li
Yunji Chen
LRM
35
19
0
23 Jan 2024
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Chenyu Wang
Weixin Luo
Qianyu Chen
Haonan Mai
Jindi Guo
Sixun Dong
Xiaohua Xuan
MLLM
LLMAG
44
19
0
19 Jan 2024
A Study on Training and Developing Large Language Models for Behavior Tree Generation
Fu Li
Xueying Wang
Bin Li
Yunlong Wu
Yanzhen Wang
Xiaodong Yi
14
4
0
16 Jan 2024
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Shafiq R. Joty
ELM
CLL
AI4MH
LRM
ALM
85
27
0
28 Nov 2023
GAIA: a benchmark for General AI Assistants
Grégoire Mialon
Clémentine Fourrier
Craig Swift
Thomas Wolf
Yann LeCun
Thomas Scialom
AI4MH
ALM
ELM
RALM
15
141
0
21 Nov 2023
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Yilun Kong
Jingqing Ruan
Yihong Chen
Bin Zhang
Tianpeng Bao
...
Xiaoru Hu
Hangyu Mao
Ziyue Li
Xingyu Zeng
Rui Zhao
LLMAG
37
37
0
19 Nov 2023
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Nicholas Farn
Richard Shin
LLMAG
ELM
32
14
0
15 Nov 2023
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Yuchen Zhuang
Xiang Chen
Tong Yu
Saayan Mitra
Victor S. Bursztyn
Ryan A. Rossi
Somdeb Sarkhel
Chao Zhang
LLMAG
36
53
0
20 Oct 2023
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Yaofang Liu
Xiaodong Cun
Xuebo Liu
Xintao Wang
Yong Zhang
Haoxin Chen
Yang Liu
Tieyong Zeng
Raymond H. F. Chan
Ying Shan
VGen
EGVM
18
127
0
17 Oct 2023
Do Large Language Models Know about Facts?
Xuming Hu
Junzhe Chen
Xiaochuan Li
Yingxin Lai
Lijie Wen
Philip S. Yu
Zhijiang Guo
HILM
KELM
31
49
0
08 Oct 2023
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
Yue Huang
Jiawen Shi
Yuan Li
Chenrui Fan
Siyuan Wu
...
Yixin Liu
Pan Zhou
Yao Wan
Neil Zhenqiang Gong
Lichao Sun
LLMAG
40
81
0
04 Oct 2023
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
Lifan Yuan
Yangyi Chen
Xingyao Wang
Yi Ren Fung
Hao Peng
Heng Ji
LLMAG
KELM
27
58
0
29 Sep 2023
1
2
Next