ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.12055
  4. Cited By
Are LLM-based Evaluators Confusing NLG Quality Criteria?

Are LLM-based Evaluators Confusing NLG Quality Criteria?

19 February 2024
Xinyu Hu
Mingqi Gao
Sen Hu
Yang Zhang
Yicheng Chen
Teng Xu
Xiaojun Wan
    AAML
    ELM
ArXivPDFHTML

Papers citing "Are LLM-based Evaluators Confusing NLG Quality Criteria?"

10 / 10 papers shown
Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Zhaoxin Fan
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
2
0
26 Apr 2025
Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Diana Galván-Sosa
Gabrielle Gaudeau
Pride Kavumba
Yunmeng Li
Hongyi gu
Zheng Yuan
Keisuke Sakaguchi
P. Buttery
LRM
35
0
0
31 Mar 2025
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
Yansen Zhang
Qingcan Kang
Wing-Yin Yu
Hailei Gong
Xiaojin Fu
Xiongwei Han
Tao Zhong
Chen Ma
OffRL
59
1
0
14 Feb 2025
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
31
4
0
07 Oct 2024
DHP Benchmark: Are LLMs Good NLG Evaluators?
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
49
3
0
25 Aug 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
61
55
0
18 Jun 2024
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation
Peiyuan Gong
Jiaxin Mao
ELM
54
10
0
16 Dec 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
226
572
0
03 May 2023
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
113
57
0
13 Sep 2021
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
178
3,510
0
10 Jun 2015
1