Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.05771
Cited By
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
13 September 2021
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Perturbation CheckLists for Evaluating NLG Evaluation Metrics"
16 / 16 papers shown
Title
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics
Ameya Godbole
Robin Jia
HILM
53
1
0
24 Jan 2025
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
47
3
0
25 Aug 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
60
29
0
02 Feb 2024
Three Ways of Using Large Language Models to Evaluate Chat
Ondvrej Plátek
Vojtvech Hudevcek
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
ALM
19
6
0
12 Aug 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
57
9
0
23 May 2023
BMX: Boosting Natural Language Generation Metrics with Explainability
Christoph Leiter
Hoang-Quan Nguyen
Steffen Eger
ELM
18
0
0
20 Dec 2022
DEMETR: Diagnosing Evaluation Metrics for Translation
Marzena Karpinska
N. Raj
Katherine Thai
Yixiao Song
Ankita Gupta
Mohit Iyyer
26
37
0
25 Oct 2022
Towards explainable evaluation of language models on the semantic similarity of visual concepts
Maria Lymperaiou
George Manoliadis
Orfeas Menis-Mastromichalakis
Edmund Dervakos
Giorgos Stamou
AAML
16
5
0
08 Sep 2022
Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust?
Doan Nam Long Vu
N. Moosavi
Steffen Eger
21
9
0
06 Sep 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
Longxuan Ma
Ziyu Zhuang
Weinan Zhang
Mingda Li
Ting Liu
26
4
0
17 Aug 2022
Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview
Florian Karl
Tobias Pielok
Julia Moosbauer
Florian Pfisterer
Stefan Coors
...
Jakob Richter
Michel Lang
Eduardo C. Garrido-Merchán
Juergen Branke
B. Bischl
AI4CE
26
56
0
15 Jun 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
99
26
0
13 May 2022
Towards Explainable Evaluation Metrics for Natural Language Generation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
AAML
ELM
24
20
0
21 Mar 2022
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
Wei-Ye Zhao
Michael Strube
Steffen Eger
21
37
0
26 Jan 2022
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
254
285
0
02 Feb 2021
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
33
228
0
27 Aug 2020
1