ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.08326
  4. Cited By
RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large
  Language Models in Tool Learning
v1v2 (latest)

RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

16 January 2024
Junjie Ye
Yilong Wu
Songyang Gao
Caishuang Huang
Sixian Li
Guanyu Li
Xiaoran Fan
Qi Zhang
Tao Gui
Xuanjing Huang
    AAML
ArXiv (abs)PDFHTMLGithub (15★)

Papers citing "RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning"

11 / 11 papers shown
Title
ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models
ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models
Zihao Cheng
Hongru Wang
Zeming Liu
Yuhang Guo
Yuanfang Guo
Yunhong Wang
Haifeng Wang
89
0
0
19 May 2025
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
Junjie Ye
Zhengyin Du
Xuesong Yao
Weijian Lin
Yufei Xu
...
Siyu Yuan
Tao Gui
Qi Zhang
Xuanjing Huang
Jiecao Chen
114
0
0
05 Jan 2025
Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?
Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?
Seungbin Yang
Yujin Baek
Taehee Kim
Jaegul Choo
66
2
0
18 Jun 2024
ToolSword: Unveiling Safety Issues of Large Language Models in Tool
  Learning Across Three Stages
ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
Junjie Ye
Sixian Li
Guanyu Li
Caishuang Huang
Songyang Gao
Yilong Wu
Qi Zhang
Tao Gui
Xuanjing Huang
LLMAG
139
27
0
16 Feb 2024
AgentTuning: Enabling Generalized Agent Abilities for LLMs
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Aohan Zeng
Mingdao Liu
Rui Lu
Bowen Wang
Xiao Liu
Yuxiao Dong
Jie Tang
LM&MAALMLLMAG
99
183
0
19 Oct 2023
DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy
  Slot Filling Task
DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task
Guanting Dong
Tingfeng Hui
Zhuoma Gongque
Jinxu Zhao
Daichi Guo
Gang Zhao
Keqing He
Weiran Xu
DiffM
87
10
0
16 Oct 2023
Improving the Robustness of Summarization Systems with Dual Augmentation
Improving the Robustness of Summarization Systems with Dual Augmentation
Preslav Nakov
Guodong Long
Chongyang Tao
Mingzhe Li
Xin Gao
Chen Zhang
Xiangliang Zhang
AAML
51
12
0
01 Jun 2023
How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language
  Understanding Tasks
How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks
Xuanting Chen
Junjie Ye
Can Zu
Nuo Xu
Rui Zheng
Minlong Peng
Jie Zhou
Tao Gui
Qi Zhang
Xuanjing Huang
AI4MHELM
61
83
0
01 Mar 2023
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution
  Perspective
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
Jindong Wang
Xixu Hu
Wenxin Hou
Hao Chen
Runkai Zheng
...
Weirong Ye
Xiubo Geng
Binxing Jiao
Yue Zhang
Xingxu Xie
AI4MH
124
235
0
22 Feb 2023
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDaMoMe
208
1,640
0
15 Dec 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAGReLMLRM
431
2,946
0
06 Oct 2022
1