Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.10132
Cited By
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
20 January 2025
Lucen Zhong
Zhengxiao Du
Xiaohan Zhang
Haiyi Hu
J. Tang
LLMAG
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario"
6 / 6 papers shown
Title
Invocable APIs derived from NL2SQL datasets for LLM Tool-Calling Evaluation
Benjamin Elder
Anupama Murthi
J. Kang
Ankita Rajaram Naik
Kiran Kate
Kinjal Basu
Danish Contractor
20
0
0
12 Jun 2025
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Jiachen Zhu
Menghui Zhu
Renting Rui
Rong Shan
Congmin Zheng
...
Jianghao Lin
Weiwen Liu
Ruiming Tang
Yong Yu
Weinan Zhang
LLMAG
ELM
38
0
0
06 Jun 2025
ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions
Beong-woo Kwak
Minju Kim
Dongha Lim
Hyungjoo Chae
Dongjin Kang
Sunghwan Kim
Dongil Yang
Jinyoung Yeo
LLMAG
RALM
71
0
0
29 May 2025
Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling
Ishan Kavathekar
Raghav Donakanti
Ponnurangam Kumaraguru
Karthik Vaidhyanathan
142
1
0
27 Apr 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
07 May 2025
197
14
0
20 Mar 2025
Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies
Luyi Jiang
Jiasi Chen
Lu Lu
Xinwei Peng
Lihao Liu
Junjun He
Jie Xu
ELM
LM&MA
80
0
0
10 Mar 2025
1