Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.00212
Cited By
v1
v2
v3 (latest)
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
30 April 2025
Shaokun Zhang
Ming Yin
Jieyu Zhang
Jing Liu
Zhiguang Han
Jingyang Zhang
Beibin Li
Chi Wang
Hongru Wang
Yuxiao Chen
Qingyun Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems"
38 / 38 papers shown
Title
From Virtual Agents to Robot Teams: A Multi-Robot Framework Evaluation in High-Stakes Healthcare Context
Yuanchen Bai
Zijian Ding
Angelique Taylor
66
0
0
04 Jun 2025
Why do AI agents communicate in human language?
Pengcheng Zhou
Yinglun Feng
Halimulati Julaiti
Zhongliang Yang
LLMAG
41
0
0
03 Jun 2025
The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets
Shenzhe Zhu
Jiao Sun
Yi Nian
Tobin South
Alex Pentland
Jiaxin Pei
40
0
0
29 May 2025
Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks
Yuxuan Li
Aoi Naito
Hirokazu Shirado
LLMAG
111
1
0
15 May 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
392
2,024
0
22 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
368
112
0
25 Nov 2024
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
Adam Fourney
Gagan Bansal
Hussein Mozannar
Cheng Tan
Eduardo Salinas
...
Victor C. Dibia
Ahmed Hassan Awadallah
Ece Kamar
Rafah Hosn
Saleema Amershi
AI4CE
LRM
LLMAG
136
50
0
07 Nov 2024
EcoAct: Economic Agent Determines When to Register What Action
Shaokun Zhang
Jieyu Zhang
Dujian Ding
Mirian Hipolito Garcia
Ankur Mallick
Daniel Madrigal
Menglin Xia
Victor Rühle
Qingyun Wu
Chi Wang
LLMAG
100
4
0
03 Nov 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELM
ALM
153
52
0
16 Oct 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
...
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
149
44
0
14 Oct 2024
Language Model Preference Evaluation with Multiple Weak Evaluators
Zhengyu Hu
Jieyu Zhang
Zhihan Xiong
Alexander Ratner
Hui Xiong
Ranjay Krishna
181
5
0
14 Oct 2024
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning
Alireza Ghafarollahi
Markus J. Buehler
LLMAG
AI4CE
52
42
0
09 Sep 2024
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?
Ori Yoran
S. Amouyal
Chaitanya Malaviya
Ben Bogin
Ofir Press
Jonathan Berant
LLMAG
109
44
0
22 Jul 2024
Needle in the Haystack for Memory Based Large Language Models
Elliot Nelson
Georgios Kollias
Payel Das
Subhajit Chaudhury
Soham Dan
KELM
RALM
117
15
0
01 Jul 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
177
65
0
18 Jun 2024
Adaptive In-conversation Team Building for Language Model Agents
Linxin Song
Jiale Liu
Jieyu Zhang
Shaokun Zhang
Ao Luo
Shijian Wang
Qingyun Wu
Chi Wang
LLMAG
153
14
0
29 May 2024
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie
Danyang Zhang
Jixuan Chen
Xiaochuan Li
Siheng Zhao
...
Shuyan Zhou
Silvio Savarese
Caiming Xiong
Victor Zhong
Tao Yu
133
176
0
11 Apr 2024
Offline Training of Language Model Agents with Functions as Learnable Weights
Shaokun Zhang
Jieyu Zhang
Jiale Liu
Linxin Song
Chi Wang
Ranjay Krishna
Qingyun Wu
LLMAG
LM&Ro
AIFin
100
18
0
17 Feb 2024
GAIA: a benchmark for General AI Assistants
Grégoire Mialon
Clémentine Fourrier
Craig Swift
Thomas Wolf
Yann LeCun
Thomas Scialom
AI4MH
ALM
ELM
RALM
98
186
0
21 Nov 2023
Evil Geniuses: Delving into the Safety of LLM-based Agents
Yu Tian
Xiao Yang
Jingyuan Zhang
Yinpeng Dong
Hang Su
LLMAG
AAML
100
67
0
20 Nov 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik Narasimhan
ELM
141
647
0
10 Oct 2023
EcoAssistant: Using LLM Assistant More Affordably and Accurately
Jieyu Zhang
Ranjay Krishna
Ahmed Hassan Awadallah
Chi Wang
86
40
0
03 Oct 2023
Large Language Models Cannot Self-Correct Reasoning Yet
Jie Huang
Xinyun Chen
Swaroop Mishra
Huaixiu Steven Zheng
Adams Wei Yu
Xinying Song
Denny Zhou
ReLM
LRM
103
488
0
03 Oct 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELM
LLMAG
ALM
99
504
0
14 Aug 2023
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Ning Miao
Yee Whye Teh
Tom Rainforth
ReLM
LRM
83
135
0
01 Aug 2023
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
Sirui Hong
Mingchen Zhuge
Jonathan Chen
Xiawu Zheng
Yuheng Cheng
...
Liyang Zhou
Chenyu Ran
Lingfeng Xiao
Chenglin Wu
Jürgen Schmidhuber
LLMAG
AIFin
122
548
0
01 Aug 2023
Mind2Web: Towards a Generalist Agent for the Web
Xiang Deng
Yu Gu
Boyuan Zheng
Shijie Chen
Samuel Stevens
Boshi Wang
Huan Sun
Yu-Chuan Su
LLMAG
141
488
0
09 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
581
4,455
0
09 Jun 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao
Dian Yu
Jeffrey Zhao
Izhak Shafran
Thomas Griffiths
Yuan Cao
Karthik Narasimhan
LM&Ro
LRM
AI4CE
227
2,055
0
17 May 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDa
ALM
175
520
0
31 Mar 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
231
1,215
0
29 Mar 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn
Federico Cassano
Beck Labash
A. Gopinath
Karthik Narasimhan
Shunyu Yao
LLMAG
KELM
145
1,328
0
20 Mar 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.6K
14,832
0
15 Mar 2023
GPTScore: Evaluate as You Desire
Jinlan Fu
See-Kiong Ng
Zhengbao Jiang
Pengfei Liu
LM&MA
ALM
ELM
188
291
0
08 Feb 2023
Towards Reasoning in Large Language Models: A Survey
Jie Huang
Kevin Chen-Chuan Chang
LM&MA
ELM
LRM
174
645
0
20 Dec 2022
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
473
2,998
0
06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
1.0K
9,796
0
28 Jan 2022
Hyper-Parameter Optimization: A Review of Algorithms and Applications
Tong Yu
Hong Zhu
AAML
96
542
0
12 Mar 2020
1