
v1v2 (latest)
RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning
Xuanjing Huang
Papers citing "RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning"
11 / 11 papers shown