RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large
  Language Models in Tool Learning
v1v2 (latest)

RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

    AAML

Papers citing "RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning"

11 / 11 papers shown
Title