ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.01862
  4. Cited By
Wider and Deeper LLM Networks are Fairer LLM Evaluators

Wider and Deeper LLM Networks are Fairer LLM Evaluators

3 August 2023
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
    ALM
ArXivPDFHTML

Papers citing "Wider and Deeper LLM Networks are Fairer LLM Evaluators"

50 / 58 papers shown
Title
Internet of Agents: Fundamentals, Applications, and Challenges
Internet of Agents: Fundamentals, Applications, and Challenges
Yuntao Wang
Shaolong Guo
Yanghe Pan
Zhou Su
Fahao Chen
Tom H. Luan
Peng Li
Jiawen Kang
Dusit Niyato
LLMAG
LM&Ro
AI4CE
60
0
0
12 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
2
0
26 Apr 2025
VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
Christine P. Lee
David J. Porfirio
Xinyu Jessica Wang
Kevin Zhao
Bilge Mutlu
82
1
0
25 Feb 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
132
69
0
20 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
120
67
0
25 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
78
0
0
12 Nov 2024
Bayesian Calibration of Win Rate Estimation with LLM Evaluators
Bayesian Calibration of Win Rate Estimation with LLM Evaluators
Yicheng Gao
G. Xu
Zhe Wang
Arman Cohan
38
6
0
07 Nov 2024
SMoA: Improving Multi-agent Large Language Models with Sparse
  Mixture-of-Agents
SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents
Dawei Li
Zhen Tan
Peijia Qian
Yifan Li
Kumar Satvik Chaudhary
Lijie Hu
Jiayi Shen
52
7
0
05 Nov 2024
Do LLMs estimate uncertainty well in instruction-following?
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo
Miao Xiong
Christina Heinze-Deml
Jaya Narain
ELM
52
3
0
18 Oct 2024
An Automatic and Cost-Efficient Peer-Review Framework for Language
  Generation Evaluation
An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Junjie Chen
Weihang Su
Zhumin Chu
Haitao Li
Qinyao Ai
Yiqun Liu
Min Zhang
Shaoping Ma
29
3
0
16 Oct 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELM
ALM
51
38
0
16 Oct 2024
ReIFE: Re-evaluating Instruction-Following Evaluation
ReIFE: Re-evaluating Instruction-Following Evaluation
Yixin Liu
Kejian Shi
Alexander R. Fabbri
Yilun Zhao
Peifeng Wang
Chien-Sheng Wu
Shafiq Joty
Arman Cohan
24
6
0
09 Oct 2024
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye
Yanbo Wang
Yue Huang
Dongping Chen
Qihui Zhang
...
Werner Geyer
Chao Huang
Pin-Yu Chen
Nitesh V. Chawla
Xiangliang Zhang
ELM
37
45
0
03 Oct 2024
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of
  Tasks, Techniques, and Trends
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends
Xinghua Zhang
Haiyang Yu
Yongbin Li
Minzheng Wang
Longze Chen
Fei Huang
43
5
0
21 Sep 2024
Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
Wenyuan Zhang
Jiawei Sheng
Shuaiyi Nie
Zefeng Zhang
Xinghua Zhang
Yongquan He
Tingwen Liu
30
4
0
18 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Z. Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
50
11
0
04 Sep 2024
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
Yuhao Wu
Ming Shan Hee
Zhiqing Hu
Roy Ka-Wei Lee
RALM
33
0
0
03 Sep 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive
  Survey of Story Evaluation
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
Dingyi Yang
Qin Jin
44
5
0
26 Aug 2024
DHP Benchmark: Are LLMs Good NLG Evaluators?
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
49
3
0
25 Aug 2024
StructEval: Deepen and Broaden Large Language Model Assessment via
  Structured Evaluation
StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation
Boxi Cao
Mengjie Ren
Hongyu Lin
Xianpei Han
Feng Zhang
Junfeng Zhan
Le Sun
ELM
34
3
0
06 Aug 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated
  Responses
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao
Xiaoyuan Yi
Xing Xie
ELM
ALM
38
7
0
15 Jul 2024
Multilingual Blending: LLM Safety Alignment Evaluation with Language
  Mixture
Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture
Jiayang Song
Yuheng Huang
Zhehua Zhou
Lei Ma
45
6
0
10 Jul 2024
OffsetBias: Leveraging Debiased Data for Tuning Evaluators
OffsetBias: Leveraging Debiased Data for Tuning Evaluators
Junsoo Park
Seungyeon Jwa
Meiying Ren
Daeyoung Kim
Sanghyuk Choi
ALM
34
32
0
09 Jul 2024
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended
  Multi-Doc QA
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA
Minzheng Wang
Longze Chen
Cheng Fu
Shengyi Liao
Xinghua Zhang
...
Run Luo
Yunshui Li
Min Yang
Fei Huang
Yongbin Li
RALM
54
44
0
25 Jun 2024
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
Sumanth Doddapaneni
Mohammed Safi Ur Rahman Khan
Sshubam Verma
Mitesh Khapra
39
11
0
19 Jun 2024
Improving the Validity and Practical Usefulness of AI/ML Evaluations
  Using an Estimands Framework
Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework
Olivier Binette
Jerome P. Reiter
38
0
0
14 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
39
0
06 Jun 2024
Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A
  Model-Based Reinforcement Learning Approach
Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach
Yuxuan Chen
Rongpeng Li
Xiaoxue Yu
Zhifeng Zhao
Honggang Zhang
42
9
0
03 Jun 2024
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles
  and Committee Discussions
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions
Ruochen Zhao
Wenxuan Zhang
Yew Ken Chia
Deli Zhao
Lidong Bing
41
9
0
30 May 2024
Language Models can Evaluate Themselves via Probability Discrepancy
Language Models can Evaluate Themselves via Probability Discrepancy
Tingyu Xia
Bowen Yu
Yuan Wu
Yi-Ju Chang
Chang Zhou
ELM
34
4
0
17 May 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
44
20
0
04 Apr 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
45
39
0
26 Mar 2024
Scaling Data Diversity for Fine-Tuning Language Models in Human
  Alignment
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment
Feifan Song
Bowen Yu
Hao Lang
Haiyang Yu
Fei Huang
Houfeng Wang
Yongbin Li
ALM
40
11
0
17 Mar 2024
Sora as an AGI World Model? A Complete Survey on Text-to-Video
  Generation
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Joseph Cho
Fachrina Dewi Puspitasari
Sheng Zheng
Jingyao Zheng
Lik-Hang Lee
Tae-Ho Kim
Choong Seon Hong
Chaoning Zhang
EGVM
VGen
36
40
0
08 Mar 2024
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on
  Zero-shot LLM Assessment
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
Vyas Raina
Adian Liusie
Mark J. F. Gales
AAML
ELM
32
52
0
21 Feb 2024
TreeEval: Benchmark-Free Evaluation of Large Language Models through
  Tree Planning
TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning
Xiang Li
Yunshi Lan
Chao Yang
ELM
46
8
0
20 Feb 2024
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM
  Evaluation
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
Siyuan Wang
Zhuohan Long
Zhihao Fan
Zhongyu Wei
Xuanjing Huang
LLMAG
21
26
0
18 Feb 2024
Tell Me More! Towards Implicit User Intention Understanding of Language
  Model Driven Agents
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
Cheng Qian
Bingxiang He
Zhuang Zhong
Jia Deng
Yujia Qin
...
Zhong Zhang
Jie Zhou
Yankai Lin
Zhiyuan Liu
Maosong Sun
30
28
0
14 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
60
29
0
02 Feb 2024
Large Language Models are Superpositions of All Characters: Attaining
  Arbitrary Role-play via Self-Alignment
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Keming Lu
Bowen Yu
Chang Zhou
Jingren Zhou
52
56
0
23 Jan 2024
Leveraging Large Language Models for NLG Evaluation: Advances and
  Challenges
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MA
ELM
39
9
0
13 Jan 2024
Large Language Models for Social Networks: Applications, Challenges, and
  Solutions
Large Language Models for Social Networks: Applications, Challenges, and Solutions
Jingying Zeng
Richard Huang
Waleed Malik
Langxuan Yin
Bojan Babic
Danny Shacham
Xiao Yan
Jaewon Yang
Qi He
22
7
0
04 Jan 2024
GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
Bohan Lyu
Xin Cong
Heyang Yu
Pan Yang
Yujia Qin
...
Zhong Zhang
Yukun Yan
Yankai Lin
Zhiyuan Liu
Maosong Sun
LLMAG
30
5
0
28 Dec 2023
Universal Self-Consistency for Large Language Model Generation
Universal Self-Consistency for Large Language Model Generation
Xinyun Chen
Renat Aksitov
Uri Alon
Jie Jessie Ren
Kefan Xiao
Pengcheng Yin
Sushant Prakash
Charles Sutton
Xuezhi Wang
Denny Zhou
LRM
26
66
0
29 Nov 2023
Branch-Solve-Merge Improves Large Language Model Evaluation and
  Generation
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Swarnadeep Saha
Omer Levy
Asli Celikyilmaz
Mohit Bansal
Jason Weston
Xian Li
MoMe
28
71
0
23 Oct 2023
GestureGPT: Toward Zero-shot Interactive Gesture Understanding and
  Grounding with Large Language Model Agents
GestureGPT: Toward Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents
Xin Zeng
Xiaoyu Wang
Tengxiang Zhang
Chun Yu
Shengdong Zhao
Yiqiang Chen
LLMAG
LM&Ro
SLR
19
1
0
19 Oct 2023
SOTOPIA: Interactive Evaluation for Social Intelligence in Language
  Agents
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
Xuhui Zhou
Hao Zhu
Leena Mathur
Ruohong Zhang
Haofei Yu
...
Louis-Philippe Morency
Yonatan Bisk
Daniel Fried
Graham Neubig
Maarten Sap
LLMAG
33
118
0
18 Oct 2023
Advancing Perception in Artificial Intelligence through Principles of
  Cognitive Science
Advancing Perception in Artificial Intelligence through Principles of Cognitive Science
Palaash Agrawal
Cheston Tan
Heena Rathore
54
1
0
13 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
41
166
0
11 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
35
76
0
09 Oct 2023
12
Next