ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.07201
  4. Cited By
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

14 August 2023
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
    ELM
    LLMAG
    ALM
ArXivPDFHTML

Papers citing "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"

50 / 97 papers shown
Title
The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News
The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News
Yuhan Liu
Yong-Jin Liu
Xiaoqing Zhang
Xiuying Chen
Rui Yan
LLMAG
45
0
0
13 May 2025
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
Chalamalasetti Kranti
Sherzod Hakimov
David Schlangen
LLMAG
49
0
0
08 May 2025
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Wenhao Li
Bo Jin
Mingyi Hong
Changhong Lu
Xiangfeng Wang
48
0
0
07 May 2025
am-ELO: A Stable Framework for Arena-based LLM Evaluation
am-ELO: A Stable Framework for Arena-based LLM Evaluation
Zirui Liu
Jiatong Li
Yan Zhuang
Qiang Liu
Shuanghong Shen
Jie Ouyang
Mingyue Cheng
Shijin Wang
44
1
0
06 May 2025
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Bang Zhang
Ruotian Ma
Qingxuan Jiang
Peisong Wang
Jiaqi Chen
...
Fanghua Ye
Jian Li
Yifan Yang
Zhaopeng Tu
Xiaolong Li
LLMAG
ELM
ALM
111
0
1
01 May 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
Jing Liu
Zhiguang Han
...
Beibin Li
Chi Wang
H. Wang
Yuxiao Chen
Qingyun Wu
49
1
0
30 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
2
0
26 Apr 2025
Planet as a Brain: Towards Internet of AgentSites based on AIOS Server
Planet as a Brain: Towards Internet of AgentSites based on AIOS Server
Xiang Zhang
Yongfeng Zhang
44
0
0
19 Apr 2025
Evaluation Under Imperfect Benchmarks and Ratings: A Case Study in Text Simplification
Evaluation Under Imperfect Benchmarks and Ratings: A Case Study in Text Simplification
Joseph Liu
Yoonsoo Nam
Xinyue Cui
Swabha Swayamdipta
56
0
0
13 Apr 2025
Large Language Models as Particle Swarm Optimizers
Large Language Models as Particle Swarm Optimizers
Yamato Shinohara
Jinglue Xu
Tianshui Li
Hitoshi Iba
31
0
0
12 Apr 2025
Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning
Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning
Sugyeong Eo
Hyeonseok Moon
Evelyn Hayoon Zi
Chanjun Park
Heuiseok Lim
LLMAG
48
1
0
07 Apr 2025
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
Jianhao Chen
Zishuo Xun
Bocheng Zhou
Han Qi
Qiaosheng Zhang
...
Wei Hu
Yuzhong Qu
W. Ouyang
Wanli Ouyang
Shuyue Hu
74
1
0
01 Apr 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
54
0
0
29 Mar 2025
Don't lie to your friends: Learning what you know from collaborative self-play
Don't lie to your friends: Learning what you know from collaborative self-play
Jacob Eisenstein
Reza Aghajani
Adam Fisch
Dheeru Dua
Fantine Huot
Mirella Lapata
Vicky Zayats
Jonathan Berant
72
0
0
18 Mar 2025
Why Do Multi-Agent LLM Systems Fail?
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
Presented at ResearchTrend Connect | LLMAG on 23 Apr 2025
131
9
0
17 Mar 2025
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
Yu Zhang
Shutong Qiao
Jiaqi Zhang
Tzu-Heng Lin
Chen Gao
Yong Li
LM&Ro
LM&MA
90
1
0
07 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
62
0
0
06 Mar 2025
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
Yuxuan Liu
Hongda Sun
Wei Liu
Jian Luan
Bo Du
Rui Yan
58
2
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
71
8
0
24 Feb 2025
The Hidden Strength of Disagreement: Unraveling the Consensus-Diversity Tradeoff in Adaptive Multi-Agent Systems
The Hidden Strength of Disagreement: Unraveling the Consensus-Diversity Tradeoff in Adaptive Multi-Agent Systems
Zengqing Wu
Takayuki Ito
42
0
0
23 Feb 2025
Reading between the Lines: Can LLMs Identify Cross-Cultural Communication Gaps?
Reading between the Lines: Can LLMs Identify Cross-Cultural Communication Gaps?
Sougata Saha
Saurabh Kumar Pandey
Harshit Gupta
Monojit Choudhury
55
0
0
21 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
60
3
0
21 Feb 2025
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
Zhaopeng Feng
Jiayuan Su
Jiamei Zheng
Jiahan Ren
Yan Zhang
Jian Wu
Hongwei Wang
Zuozhu Liu
ELM
208
0
0
21 Feb 2025
KIMAs: A Configurable Knowledge Integrated Multi-Agent System
KIMAs: A Configurable Knowledge Integrated Multi-Agent System
Zitao Li
Fei Wei
Yuexiang Xie
Dawei Gao
Weirui Kuang
Zhijian Ma
Bingchen Qian
Yaliang Li
Bolin Ding
63
0
0
13 Feb 2025
SedarEval: Automated Evaluation using Self-Adaptive Rubrics
Zhiyuan Fan
Weinong Wang
Xing Wu
Debing Zhang
41
1
0
28 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
67
96
0
03 Jan 2025
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice H. Oh
Seohyon Jung
94
1
0
03 Jan 2025
What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents
What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents
Mingyu Jin
Beichen Wang
Zhaoqian Xue
Suiyuan Zhu
Wenyue Hua
Hua Tang
Kai Mei
Jundong Li
Yongfeng Zhang
LM&Ro
LLMAG
87
10
0
03 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
93
12
0
31 Dec 2024
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities
Fabrizio Davide
Pietro Torre
Andrea Gaggioli
Andrea Gaggioli
ELM
157
0
0
12 Dec 2024
AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework
AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework
Meihao Fan
Ju Fan
Nan Tang
Lei Cao
Guoliang Li
Xiaoyong Du
LMTD
123
0
0
10 Dec 2024
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
Kangyu Zhu
Peng Xia
Yun-Qing Li
Hongtu Zhu
Sheng Wang
Huaxiu Yao
103
1
0
09 Dec 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
123
70
0
25 Nov 2024
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Y. Qi
Hao Peng
Xinyu Wang
Bin Xu
Lei Hou
Juanzi Li
61
1
0
31 Oct 2024
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Jingxuan Chen
Derek Yuen
Bin Xie
Yuqing Yang
Gongwei Chen
...
Liqiang Nie
Yasheng Wang
Jianye Hao
Jun Wang
Kun Shao
LLMAG
50
5
0
19 Oct 2024
Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges
Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges
Handi Chen
Weipeng Deng
Shuo Yang
J. Xu
Zhihan Jiang
Edith C.H. Ngai
Jiangchuan Liu
Xue Liu
ELM
21
1
0
16 Oct 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELM
ALM
56
38
0
16 Oct 2024
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks
Guibin Zhang
Xinfeng Li
Xiangguo Sun
Guancheng Wan
Miao Yu
Junfeng Fang
Kun Wang
Dawei Cheng
Dawei Cheng
AAML
AI4CE
51
7
0
15 Oct 2024
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
Yi-Fan Lu
Xian-Ling Mao
Tian Lan
Heyan Huang
Heyan Huang
Xiaoyan Gao
55
0
0
12 Oct 2024
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations
Yuchen Fan
Xin Zhong
Heng Zhou
Yuchen Zhang
Mingyu Liang
Chengxing Xie
Ermo Hua
Ning Ding
Bowen Zhou
ALM
ELM
31
0
0
02 Oct 2024
HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations
HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations
Ziyu Wang
Hao Li
Di Huang
Amir M. Rahmani
Chae-Won Shin
Amir M. Rahmani
LM&MA
48
9
0
28 Sep 2024
GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group
  Discussion
GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion
Tongxuan Liu
Xingyu Wang
Weizhe Huang
Wenjiang Xu
Yuting Zeng
Lei Jiang
Hailong Yang
Jing Li
LLMAG
44
8
0
21 Sep 2024
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Ilya Gusev
LLMAG
58
3
0
10 Sep 2024
LLM-based multi-agent poetry generation in non-cooperative environments
LLM-based multi-agent poetry generation in non-cooperative environments
Ran Zhang
Steffen Eger
LLMAG
37
5
0
05 Sep 2024
DHP Benchmark: Are LLMs Good NLG Evaluators?
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
49
3
0
25 Aug 2024
Can LLMs Replace Manual Annotation of Software Engineering Artifacts?
Can LLMs Replace Manual Annotation of Software Engineering Artifacts?
Toufique Ahmed
Premkumar Devanbu
Christoph Treude
Michael Pradel
77
11
0
10 Aug 2024
Self-Emotion Blended Dialogue Generation in Social Simulation Agents
Self-Emotion Blended Dialogue Generation in Social Simulation Agents
Qiang Zhang
Jason Naradowsky
Yusuke Miyao
31
2
0
03 Aug 2024
CoEvol: Constructing Better Responses for Instruction Finetuning through
  Multi-Agent Cooperation
CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation
Renhao Li
Minghuan Tan
Derek F. Wong
Min Yang
LLMAG
23
1
0
11 Jun 2024
Scaling Large Language Model-based Multi-Agent Collaboration
Scaling Large Language Model-based Multi-Agent Collaboration
Chen Qian
Zihao Xie
YiFei Wang
Wei Liu
Yufan Dang
...
Zhuoyun Du
Weize Chen
Cheng Yang
Zhiyuan Liu
Maosong Sun
AI4CE
LLMAG
LM&Ro
66
46
0
11 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
105
31
0
09 Jun 2024
12
Next