Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.14335
Cited By
Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics
17 June 2025
Silvia Casola
Yang Liu
Siyao Peng
Oliver Kraus
Albert Gatt
Barbara Plank
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics"
26 / 26 papers shown
Title
GUM-SAGE: A Novel Dataset and Approach for Graded Entity Salience Prediction
Jessica Lin
Amir Zeldes
49
1
0
15 Apr 2025
Do LLMs write like humans? Variation in grammatical and rhetorical styles
Alex Reinhart
David West Brown
Ben Markey
Michael Laudenbach
Kachatad Pantusen
Ronald Yurko
Gordon Weinberg
DeLMO
58
10
0
21 Oct 2024
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Patrícia Schmidtová
Saad Mahamood
Simone Balloccu
Ondřej Dušek
Albert Gatt
Dimitra Gkatzia
David M. Howcroft
Ondřej Plátek
Adarsa Sivaprasad
69
4
0
17 Aug 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
66
34
0
01 Jul 2024
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MA
ELM
94
15
0
13 Jan 2024
Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies
Tom Kocmi
Vilém Zouhar
C. Federmann
Matt Post
63
31
0
12 Jan 2024
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
Tianyi Tang
Hongyuan Lu
Yuchen Eleanor Jiang
Haoyang Huang
Dongdong Zhang
Wayne Xin Zhao
Tom Kocmi
Furu Wei
40
7
0
24 May 2023
What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability
Mario Giulianelli
Joris Baan
Wilker Aziz
Raquel Fernández
Barbara Plank
UQLM
60
32
0
19 May 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
176
1,205
0
29 Mar 2023
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
110
409
0
26 Sep 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
138
193
0
14 Feb 2022
BARTScore: Evaluating Generated Text as Text Generation
Weizhe Yuan
Graham Neubig
Pengfei Liu
109
843
0
22 Jun 2021
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
ELM
75
36
0
26 Oct 2020
SummEval: Re-evaluating Summarization Evaluation
Alexander R. Fabbri
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
Dragomir R. Radev
HILM
97
719
0
24 Jul 2020
SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
Daniel Deutsch
Dan Roth
83
26
0
10 Jul 2020
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
112
387
0
26 Jun 2020
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
101
1,501
0
09 Apr 2020
Fill in the BLANC: Human-free quality estimation of document summaries
Oleg V. Vasilyev
Vedant Dharnidharka
John Bohannon
3DH
80
119
0
23 Feb 2020
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Wei Zhao
Maxime Peyrard
Fei Liu
Yang Gao
Christian M. Meyer
Steffen Eger
181
602
0
05 Sep 2019
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models
Thomas Scialom
Sylvain Lamprier
Benjamin Piwowarski
Jacopo Staiano
73
149
0
04 Sep 2019
Neural Text Summarization: A Critical Evaluation
Wojciech Kry'sciñski
N. Keskar
Bryan McCann
Caiming Xiong
R. Socher
81
367
0
23 Aug 2019
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
332
5,845
0
21 Apr 2019
A Call for Clarity in Reporting BLEU Scores
Matt Post
170
2,994
0
23 Apr 2018
Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond
Ramesh Nallapati
Bowen Zhou
Cicero Nogueira dos Santos
Çağlar Gülçehre
Bing Xiang
AIMat
270
2,564
0
19 Feb 2016
Better Summarization Evaluation with Word Embeddings for ROUGE
Jun-Ping Ng
Viktoria Abrecht
61
172
0
25 Aug 2015
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
347
3,551
0
10 Jun 2015
1