Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.13304
Cited By
ToolQA: A Dataset for LLM Question Answering with External Tools
23 June 2023
Yuchen Zhuang
Yue Yu
Kuan-Chieh Jackson Wang
Haotian Sun
Chao Zhang
ELM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ToolQA: A Dataset for LLM Question Answering with External Tools"
50 / 50 papers shown
Title
Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents
Ratnadira Widyasari
Martin Weyssow
Ivana Clairine Irsan
Han Wei Ang
Frank Liauw
Eng Lieh Ouh
Lwin Khin Shar
Hong Jin Kang
David Lo
LLMAG
19
0
0
16 May 2025
TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers
Aiyao He
Sijia Cui
Shuai Xu
Yanna Wang
Bo Xu
39
0
0
13 May 2025
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
Zehao Fan
Garrett Gagnon
Zhenyu Liu
Liu Liu
29
0
0
09 May 2025
When2Call: When (not) to Call Tools
Hayley Ross
Ameya Sunil Mahabaleshwarkar
Yoshi Suhara
95
0
0
26 Apr 2025
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
Lei Shen
Xiaoyu Shen
61
0
0
25 Apr 2025
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
Zhangcheng Qiang
Kerry Taylor
Weiqing Wang
Jing Jiang
54
0
0
25 Mar 2025
Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity
Yupu Hao
Pengfei Cao
Zhuoran Jin
Huanxuan Liao
Yubo Chen
Kang Liu
Jun Zhao
LLMAG
142
1
0
02 Mar 2025
Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research
Veda C. Storey
Wei Thoo Yue
J. Leon Zhao
Roman Lukyanenko
43
0
0
25 Feb 2025
Grounding LLM Reasoning with Knowledge Graphs
Alfonso Amayuelas
Joy Prakash Sain
Simerjot Kaur
Charese Smiley
86
0
0
18 Feb 2025
MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling
Yakun Zhu
Shaohang Wei
Xu Wang
Kui Xue
Xiaofan Zhang
S. Zhang
62
1
0
17 Feb 2025
Learning Musical Representations for Music Performance Question Answering
Xingjian Diao
Chunhui Zhang
Tingxuan Wu
Ming Cheng
Z. Ouyang
Weiyi Wu
Jiang Gui
73
7
0
10 Feb 2025
Self-Training Large Language Models for Tool-Use Without Demonstrations
Ne Luo
Aryo Pradipta Gema
Xuanli He
Emile van Krieken
Pietro Lesci
Pasquale Minervini
LLMAG
79
1
0
09 Feb 2025
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
Junjie Ye
Zhengyin Du
Xuesong Yao
Weijian Lin
Yufei Xu
...
Siyu Yuan
Tao Gui
Qi Zhang
Xuanjing Huang
Jiecao Chen
59
0
0
05 Jan 2025
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu
Yufei Guo
Yuan Meng
Yanning Zhang
Le Sun
Zhi Wang
216
0
0
25 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
78
0
0
12 Nov 2024
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning
Shiyu Hu
Xuchen Li
Xuzhao Li
Jing Zhang
Yipei Wang
Xin Zhao
Kang Hao Cheong
VLM
26
1
0
20 Oct 2024
Learning Evolving Tools for Large Language Models
Guoxin Chen
Zhong Zhang
Xin Cong
Fangda Guo
Yesai Wu
Yankai Lin
Wenzheng Feng
Yasheng Wang
KELM
52
1
0
09 Oct 2024
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
Aiwei Liu
Sheng Guan
Yong-Jin Liu
L. Pan
Yifei Zhang
Liancheng Fang
Lijie Wen
Philip S. Yu
Xuming Hu
WaLM
158
2
0
04 Oct 2024
LLM With Tools: A Survey
Zhuocheng Shen
43
8
0
24 Sep 2024
Automated test generation to evaluate tool-augmented LLMs as conversational AI agents
Samuel Arcadinho
David Aparicio
Mariana Almeida
34
5
0
24 Sep 2024
Learning to Ask: When LLM Agents Meet Unclear Instruction
Wenxuan Wang
Juluan Shi
Chaozheng Wang
Cheryl Lee
Chaozheng Wang
Cheryl Lee
Youliang Yuan
Jen-tse Huang
Wenxiang Jiao
Michael R. Lyu
LLMAG
34
8
0
31 Aug 2024
Simulating Financial Market via Large Language Model based Agents
Shen Gao
Yuntao Wen
Minghang Zhu
Jianing Wei
Yuhan Cheng
Qunzi Zhang
Shuo Shang
AIFin
34
12
0
28 Jun 2024
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
Haiyang Shen
Yue Li
Desong Meng
Dongqi Cai
Sheng Qi
Li Zhang
Mengwei Xu
Yun Ma
LLMAG
46
9
0
28 Jun 2024
Satyrn: A Platform for Analytics Augmented Generation
Marko Sterbentz
Cameron Barrie
Shubham Shahi
Abhratanu Dutta
Donna Hooshmand
Harper Pack
Kristian J. Hammond
36
0
0
17 Jun 2024
CancerLLM: A Large Language Model in Cancer Domain
Mingchen Li
Jiatan Huang
Jeremy Yeung
A. Blaes
Steven Johnson
Hongfang Liu
Hua Xu
Rui Zhang
ELM
LM&MA
32
4
0
15 Jun 2024
Are Large Language Models Good Statisticians?
Yizhang Zhu
Shiyin Du
Boyan Li
Yuyu Luo
Nan Tang
ELM
40
15
0
12 Jun 2024
Transforming Wearable Data into Health Insights using Large Language Model Agents
Mike A. Merrill
Akshay Paruchuri
Naghmeh Rezaei
Geza Kovacs
Javier Perez
...
Shwetak Patel
Jiening Zhan
Tim Althoff
Daniel J. McDuff
Xin Liu
LM&MA
LLMAG
AI4CE
54
9
0
10 Jun 2024
HYDRA: Model Factorization Framework for Black-Box LLM Personalization
Yuchen Zhuang
Haotian Sun
Yue Yu
Rushi Qiang
Qifan Wang
Chao Zhang
Bo Dai
AAML
53
15
0
05 Jun 2024
A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications
Wenbo Shang
Xin Huang
29
9
0
23 Apr 2024
Evalverse: Unified and Accessible Library for Large Language Model Evaluation
Jihoo Kim
Wonho Song
Dahyun Kim
Yunsu Kim
Yungi Kim
Chanjun Park
ELM
69
3
0
01 Apr 2024
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Deyi Xiong
ELM
43
0
0
19 Mar 2024
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Zhicheng Guo
Sijie Cheng
Hao Wang
Shihao Liang
Yujia Qin
Peng Li
Zhiyuan Liu
Maosong Sun
Yang Liu
ELM
52
23
0
12 Mar 2024
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim
Jeongeun Lee
Yoonho Chang
Chanyeol Choi
Junseong Kim
Jy-yong Sohn
KELM
LRM
56
2
0
27 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
371
0
09 Feb 2024
Bringing Generative AI to Adaptive Learning in Education
Hang Li
Tianlong Xu
Chaoli Zhang
Eason Chen
Jing Liang
Xing Fan
Haoyang Li
Jiliang Tang
Qingsong Wen
48
22
0
02 Feb 2024
RE-GAINS & EnChAnT: Intelligent Tool Manipulation Systems For Enhanced Query Responses
Sahil Girhepuje
Siva Sankar Sajeev
Purvam Jain
Arya Sikder
Adithya Rama Varma
Ryan George
Akshay Govind Srinivasan
Mahendra Kurup
Ashmit Sinha
Sudip Mondal
RALM
37
0
0
28 Jan 2024
Large Language Models Can Learn Temporal Reasoning
Siheng Xiong
Ali Payani
Ramana Rao Kompella
Faramarz Fekri
LRM
29
75
0
12 Jan 2024
LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?
Fuheng Zhao
Lawrence Lim
Ishtiyaque Ahmad
D. Agrawal
A. El Abbadi
Amr El Abbadi
65
9
0
16 Dec 2023
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Nicholas Farn
Richard Shin
LLMAG
ELM
40
14
0
15 Nov 2023
PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature
Jerry Junyang Cheung
Yuchen Zhuang
Yinghao Li
Pranav Shetty
Wantian Zhao
Sanjeev Grampurohit
R. Ramprasad
Chao Zhang
AI4CE
14
11
0
13 Nov 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Yujia Qin
Shi Liang
Yining Ye
Kunlun Zhu
Lan Yan
...
Jie Zhou
Mark B. Gerstein
Dahai Li
Zhiyuan Liu
Maosong Sun
CLL
ALM
LLMAG
ELM
LM&MA
87
628
0
31 Jul 2023
GenQ: Automated Question Generation to Support Caregivers While Reading Stories with Children
Arun Balajiee Lekshmi Narayanan
Ligia E. Gómez
Martha Michelle Soto Fernandez
Tri Minh Nguyen
Chris Blais
M. Restrepo
A. Glenberg
AI4Ed
21
1
0
26 May 2023
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
Jian Xie
Kai Zhang
Jiangjie Chen
Renze Lou
Yu-Chuan Su
RALM
222
156
0
22 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
328
2,232
0
22 Mar 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
273
2,510
0
06 Oct 2022
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
Aman Madaan
Amir Yazdanbakhsh
LRM
154
116
0
16 Sep 2022
Is a Question Decomposition Unit All We Need?
Pruthvi H. Patel
Swaroop Mishra
Mihir Parmar
Chitta Baral
ReLM
158
51
0
25 May 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,077
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
357
12,003
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
398
8,559
0
28 Jan 2022
1