ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.12045
  4. Cited By
$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World
  Domains

τττ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

17 June 2024
Shunyu Yao
Noah Shinn
P. Razavi
Karthik Narasimhan
    ALM
ArXivPDFHTML

Papers citing "$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains"

21 / 21 papers shown
Title
TRAIL: Trace Reasoning and Agentic Issue Localization
TRAIL: Trace Reasoning and Agentic Issue Localization
Darshan Deshpande
Varun Gangal
Hersh Mehta
Jitin Krishnan
Anand Kannappan
Rebecca Qian
27
0
0
13 May 2025
Evaluating LLM Metrics Through Real-World Capabilities
Evaluating LLM Metrics Through Real-World Capabilities
Justin K Miller
Wenjia Tang
ELM
ALM
44
0
0
13 May 2025
Measuring General Intelligence with Generated Games
Measuring General Intelligence with Generated Games
Vivek Verma
David Huang
William Chen
Dan Klein
Nicholas Tomlin
ReLM
ELM
LM&MA
LRM
53
0
0
12 May 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Joykirat Singh
Raghav Magazine
Yash Pandya
A. Nambi
LLMAG
KELM
OffRL
LRM
156
2
0
28 Apr 2025
When2Call: When (not) to Call Tools
When2Call: When (not) to Call Tools
Hayley Ross
Ameya Sunil Mahabaleshwarkar
Yoshi Suhara
93
0
0
26 Apr 2025
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
57
2
0
21 Apr 2025
Efficient Reasoning Models: A Survey
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
160
2
0
15 Apr 2025
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
Tengjun Jin
Yuxuan Zhu
Daniel Kang
LMTD
ELM
47
0
0
07 Apr 2025
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
Akshara Prabhakar
Ziqiang Liu
Weiran Yao
Jianguo Zhang
Ming Zhu
...
Juan Carlos Niebles
Shelby Heinecke
Han Wang
Shri Kiran Srinivasan
Caiming Xiong
VGen
87
2
0
04 Apr 2025
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Peijie Yu
Yifan Yang
Jiajian Li
Zelong Zhang
Haorui Wang
Xiao Feng
Feng Zhang
LLMAG
117
0
0
03 Apr 2025
Towards Personalized Conversational Sales Agents : Contextual User Profiling for Strategic Action
Towards Personalized Conversational Sales Agents : Contextual User Profiling for Strategic Action
Tongyoung Kim
Jeongeun Lee
Soojin Yoon
S. Kim
Dongha Lee
27
0
0
28 Mar 2025
MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment
MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment
Weicong Qin
Yi Xu
Weijie Yu
Chenglei Shen
Ming He
Jianping Fan
Xiao Zhang
Jun Xu
53
0
0
03 Mar 2025
AgentStudio: A Toolkit for Building General Virtual Agents
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng
Zhiyuan Huang
Zhenghai Xue
Xinrun Wang
Bo An
Shuicheng Yan
88
14
0
17 Feb 2025
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Kung-Hsiang Huang
Akshara Prabhakar
Sidharth Dhawan
Yixin Mao
Huan Wang
Silvio Savarese
Caiming Xiong
Philippe Laban
C. Wu
44
7
0
04 Nov 2024
CORE-Bench: Fostering the Credibility of Published Research Through a
  Computational Reproducibility Agent Benchmark
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Zachary S. Siegel
Sayash Kapoor
Nitya Nagdir
Benedikt Stroebl
Arvind Narayanan
34
8
0
17 Sep 2024
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Jiarui Lu
Thomas Holleis
Yizhe Zhang
Bernhard Aumayer
Feng Nan
...
Shen Ma
Mengyu Li
Guoli Yin
Zirui Wang
Ruoming Pang
LLMAG
ELM
36
29
0
08 Aug 2024
AI Agents That Matter
AI Agents That Matter
Sayash Kapoor
Benedikt Stroebl
Zachary S. Siegel
Nitya Nadgir
Arvind Narayanan
49
36
0
01 Jul 2024
USimAgent: Large Language Models for Simulating Search Users
USimAgent: Large Language Models for Simulating Search Users
Erhan Zhang
Xingzhu Wang
Peiyuan Gong
Yankai Lin
Jiaxin Mao
LLMAG
35
17
0
14 Mar 2024
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
232
1,754
0
07 Apr 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
267
2,510
0
06 Oct 2022
Task-Oriented Dialogue as Dataflow Synthesis
Task-Oriented Dialogue as Dataflow Synthesis
Semantic Machines
Jacob Andreas
J. Bufe
David Burkett
Charles C. Chen
...
Izabela Witoszko
Jason Wolfe
A. Wray
Yuchen Zhang
Alexander Zotov
AIFin
195
153
0
24 Sep 2020
1