ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.07201
  4. Cited By
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

14 August 2023
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
    ELM
    LLMAG
    ALM
ArXivPDFHTML

Papers citing "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"

50 / 102 papers shown
Title
Scaling Large Language Model-based Multi-Agent Collaboration
Scaling Large Language Model-based Multi-Agent Collaboration
Chen Qian
Zihao Xie
YiFei Wang
Wei Liu
Yufan Dang
...
Zhuoyun Du
Weize Chen
Cheng Yang
Zhiyuan Liu
Maosong Sun
AI4CE
LLMAG
LM&Ro
69
47
0
11 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
107
33
0
09 Jun 2024
Mixture-of-Agents Enhances Large Language Model Capabilities
Mixture-of-Agents Enhances Large Language Model Capabilities
Junlin Wang
Jue Wang
Ben Athiwaratkun
Ce Zhang
James Zou
LLMAG
AIFin
46
102
0
07 Jun 2024
A Survey of Language-Based Communication in Robotics
A Survey of Language-Based Communication in Robotics
William Hunt
Sarvapali D. Ramchurn
Mohammad D. Soorati
LM&Ro
67
12
0
06 Jun 2024
The Challenges of Evaluating LLM Applications: An Analysis of Automated,
  Human, and LLM-Based Approaches
The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches
Bhashithe Abeysinghe
Ruhan Circi
ELM
38
22
0
05 Jun 2024
Unveiling Selection Biases: Exploring Order and Token Sensitivity in
  Large Language Models
Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models
Sheng-Lun Wei
Cheng-Kuang Wu
Hen-Hsen Huang
Hsin-Hsi Chen
42
11
0
05 Jun 2024
Large Language Model-Enabled Multi-Agent Manufacturing Systems
Large Language Model-Enabled Multi-Agent Manufacturing Systems
Jonghan Lim
Birgit Vogel-Heuser
Ilya Kovalenko
AI4CE
LLMAG
42
2
0
04 Jun 2024
Brainstorming Brings Power to Large Language Models of Knowledge
  Reasoning
Brainstorming Brings Power to Large Language Models of Knowledge Reasoning
Zining Qin
Chenhao Wang
Huiling Qin
Weijia Jia
LRM
45
1
0
02 Jun 2024
Building Better AI Agents: A Provocation on the Utilisation of Persona
  in LLM-based Conversational Agents
Building Better AI Agents: A Provocation on the Utilisation of Persona in LLM-based Conversational Agents
Guangzhi Sun
Xiao Zhan
Jose Such
44
25
0
26 May 2024
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer
  Selection in Large Language Models
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Zhiyuan Zeng
Xiaonan Li
...
Qinyuan Cheng
Ding Wang
Xiaofeng Mou
Xipeng Qiu
XuanJing Huang
LRM
46
4
0
21 May 2024
Fennec: Fine-grained Language Model Evaluation and Correction Extended
  through Branching and Bridging
Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging
Xiaobo Liang
Haoke Zhang
Helan hu
Juntao Li
Jun Xu
Min Zhang
ALM
49
2
0
20 May 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
119
23
0
20 May 2024
The Elephant in the Room -- Why AI Safety Demands Diverse Teams
The Elephant in the Room -- Why AI Safety Demands Diverse Teams
David Rostcheck
Lara Scheibling
33
0
0
07 May 2024
Large Language Models (LLMs) as Agents for Augmented Democracy
Large Language Models (LLMs) as Agents for Augmented Democracy
Jairo Gudiño-Rosero
Umberto Grandi
César A. Hidalgo
LLMAG
37
127
0
06 May 2024
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Yu Xia
Rui Wang
Xu Liu
Mingyan Li
Tong Yu
Xiang Chen
Julian McAuley
Shuai Li
LRM
59
19
0
24 Apr 2024
A Survey on Self-Evolution of Large Language Models
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
59
23
0
22 Apr 2024
"I Wish There Were an AI": Challenges and AI Potential in Cancer
  Patient-Provider Communication
"I Wish There Were an AI": Challenges and AI Potential in Cancer Patient-Provider Communication
Ziqi Yang
Xuhai Xu
Bingsheng Yao
Jiachen Li
Jennifer Bagdasarian
G. Gao
Dakuo Wang
40
3
0
20 Apr 2024
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation
  of Large Language Models
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Zhengran Zeng
Wei Ye
Jindong Wang
Yue Zhang
Shikun Zhang
46
1
0
09 Apr 2024
Concept -- An Evaluation Protocol on Conversational Recommender Systems
  with System-centric and User-centric Factors
Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
Chen Huang
Peixin Qin
Yang Deng
Wenqiang Lei
Jiancheng Lv
Tat-Seng Chua
42
6
0
04 Apr 2024
AIOS: LLM Agent Operating System
AIOS: LLM Agent Operating System
Kai Mei
Zelong Li
Wujiang Xu
Wenyue Hua
Mingyu Jin
Yongfeng Zhang
Shuyuan Xu
Ruosong Ye
Yingqiang Ge
Yongfeng Zhang
LLMAG
30
17
0
25 Mar 2024
Content Knowledge Identification with Multi-Agent Large Language Models
  (LLMs)
Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)
Kaiqi Yang
Yucheng Chu
Taylor Darwin
Ahreum Han
Hang Li
Hongzhi Wen
Yasemin Copur-Gencturk
Jiliang Tang
Hui Liu
40
11
0
22 Mar 2024
Prospect Personalized Recommendation on Large Language Model-based Agent
  Platform
Prospect Personalized Recommendation on Large Language Model-based Agent Platform
Jizhi Zhang
Keqin Bao
Wenjie Wang
Yang Zhang
Wentao Shi
Wanhong Xu
Fuli Feng
Tat-Seng Chua
LLMAG
51
17
0
28 Feb 2024
Multi-Bit Distortion-Free Watermarking for Large Language Models
Multi-Bit Distortion-Free Watermarking for Large Language Models
Massieh Kordi Boroujeny
Ya Jiang
Kai Zeng
Brian L. Mark
WaLM
VLM
43
4
0
26 Feb 2024
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Preslav Nakov
Tairan Wang
Qingqing Zhu
Taicheng Guo
Shen Gao
Zhiyong Lu
Xin Gao
Xiangliang Zhang
80
2
0
22 Feb 2024
Defending Jailbreak Prompts via In-Context Adversarial Game
Defending Jailbreak Prompts via In-Context Adversarial Game
Yujun Zhou
Yufei Han
Haomin Zhuang
Kehan Guo
Zhenwen Liang
Hongyan Bao
Xiangliang Zhang
LLMAG
AAML
42
12
0
20 Feb 2024
High-quality Data-to-Text Generation for Severely Under-Resourced
  Languages with Out-of-the-box Large Language Models
High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models
Michela Lorandi
Anya Belz
6
5
0
19 Feb 2024
LLM Multi-Agent Systems: Challenges and Open Problems
LLM Multi-Agent Systems: Challenges and Open Problems
Shanshan Han
Qifan Zhang
Yuhang Yao
Weizhao Jin
Zhaozhuo Xu
LLMAG
50
36
0
05 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
71
30
0
02 Feb 2024
LLM Voting: Human Choices and AI Collective Decision Making
LLM Voting: Human Choices and AI Collective Decision Making
Joshua C. Yang
Damian Dailisan
Marcin Korecki
C. I. Hausladen
Dirk Helbing
34
17
0
31 Jan 2024
PRE: A Peer Review Based Large Language Model Evaluator
PRE: A Peer Review Based Large Language Model Evaluator
Zhumin Chu
Qingyao Ai
Yiteng Tu
Haitao Li
Yiqun Liu
LRM
ALM
41
21
0
28 Jan 2024
CCA: Collaborative Competitive Agents for Image Editing
CCA: Collaborative Competitive Agents for Image Editing
Tiankai Hang
Shuyang Gu
Dong Chen
Xin Geng
Baining Guo
35
5
0
23 Jan 2024
The Critique of Critique
The Critique of Critique
Shichao Sun
Junlong Li
Weizhe Yuan
Ruifeng Yuan
Wenjie Li
Pengfei Liu
ELM
40
0
0
09 Jan 2024
Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in
  the Avalon Game
Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game
Zijing Shi
Meng Fang
Shunfeng Zheng
Shilong Deng
Ling-Hao Chen
Yali Du
41
23
0
29 Dec 2023
LARP: Language-Agent Role Play for Open-World Games
LARP: Language-Agent Role Play for Open-World Games
Ming Yan
Ruihao Li
Hao Zhang
Hao Wang
Zhilan Yang
Ji Yan
LLMAG
LM&Ro
AI4CE
30
17
0
24 Dec 2023
Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate
  System
Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System
Haotian Wang
Xiyuan Du
Weijiang Yu
Qianglong Chen
Kun Zhu
Zheng Chu
Lian Yan
Yi Guan
32
10
0
08 Dec 2023
Playing Large Games with Oracles and AI Debate
Playing Large Games with Oracles and AI Debate
Xinyi Chen
Angelica Chen
Dean Foster
Elad Hazan
38
3
0
08 Dec 2023
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context
  Learning
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Bill Yuchen Lin
Abhilasha Ravichander
Ximing Lu
Nouha Dziri
Melanie Sclar
Khyathi Raghavi Chandu
Chandra Bhagavatula
Yejin Choi
22
169
0
04 Dec 2023
Self-Contradictory Reasoning Evaluation and Detection
Self-Contradictory Reasoning Evaluation and Detection
Ziyi Liu
Isabelle Lee
Yongkang Du
Soumya Sanyal
Jieyu Zhao
LRM
30
2
0
16 Nov 2023
Leveraging Large Language Models for Collective Decision-Making
Leveraging Large Language Models for Collective Decision-Making
Marios Papachristou
Longqi Yang
Chin-Chia Hsu
LLMAG
39
2
0
03 Nov 2023
Multi-Agent Consensus Seeking via Large Language Models
Multi-Agent Consensus Seeking via Large Language Models
Huaben Chen
Wenkang Ji
Lufeng Xu
Shiyu Zhao
LM&Ro
LLMAG
67
20
0
31 Oct 2023
ChoiceMates: Supporting Unfamiliar Online Decision-Making with Multi-Agent Conversational Interactions
ChoiceMates: Supporting Unfamiliar Online Decision-Making with Multi-Agent Conversational Interactions
Jeongeon Park
Bryan Min
Xiaojuan Ma
Juho Kim
Xiaojuan Ma
Juho Kim
54
16
0
02 Oct 2023
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model
  Collaboration
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Qiushi Sun
Zhangyue Yin
Xiang Li
Zhiyong Wu
Xipeng Qiu
Lingpeng Kong
LRM
LLMAG
28
44
0
30 Sep 2023
Are Large Language Models Really Robust to Word-Level Perturbations?
Are Large Language Models Really Robust to Word-Level Perturbations?
Haoyu Wang
Guozheng Ma
Cong Yu
Ning Gui
Linrui Zhang
...
Sen Zhang
Li Shen
Xueqian Wang
Peilin Zhao
Dacheng Tao
KELM
31
22
0
20 Sep 2023
Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas Griffiths
LLMAG
LM&Ro
56
154
0
05 Sep 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
46
3
0
08 Aug 2023
Improving Factuality and Reasoning in Language Models through Multiagent
  Debate
Improving Factuality and Reasoning in Language Models through Multiagent Debate
Yilun Du
Shuang Li
Antonio Torralba
J. Tenenbaum
Igor Mordatch
LLMAG
LRM
79
618
0
23 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
229
579
0
03 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
244
1,764
0
07 Apr 2023
Large Language Models are Diverse Role-Players for Summarization
  Evaluation
Large Language Models are Diverse Role-Players for Summarization Evaluation
Ning Wu
Ming Gong
Linjun Shou
Shining Liang
Daxin Jiang
75
65
0
27 Mar 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
372
12,081
0
04 Mar 2022
Previous
123
Next