Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.06935
Cited By
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
14 February 2022
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text"
18 / 68 papers shown
Title
A Framework for Deprecating Datasets: Standardizing Documentation, Identification, and Communication
A. Luccioni
Frances Corry
H. Sridharan
Mike Ananny
J. Schultz
Kate Crawford
54
29
0
18 Oct 2021
Finding a Balanced Degree of Automation for Summary Evaluation
Shiyue Zhang
Joey Tianyi Zhou
52
43
0
23 Sep 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
220
106
0
14 Sep 2021
Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP
Anna Rogers
Timothy Baldwin
Kobi Leins
104
64
0
14 Sep 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
113
57
0
13 Sep 2021
Does It Capture STEL? A Modular, Similarity-based Linguistic Style Evaluation Framework
Anna Wegmann
D. Nguyen
24
12
0
10 Sep 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
242
593
0
14 Jul 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
192
79
0
30 Apr 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
231
305
0
27 Apr 2021
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing
Boaz Shmueli
Jan Fell
Soumya Ray
Lun-Wei Ku
108
86
0
20 Apr 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
254
285
0
02 Feb 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Joey Tianyi Zhou
Christopher Ré
AAML
OffRL
OOD
154
136
0
13 Jan 2021
GO FIGURE: A Meta Evaluation of Factuality in Summarization
Saadia Gabriel
Asli Celikyilmaz
Rahul Jha
Yejin Choi
Jianfeng Gao
HILM
238
96
0
24 Oct 2020
Factual Error Correction for Abstractive Summarization Models
Mengyao Cao
Yue Dong
Jiapeng Wu
Jackie C.K. Cheung
HILM
KELM
169
159
0
17 Oct 2020
With Little Power Comes Great Responsibility
Dallas Card
Peter Henderson
Urvashi Khandelwal
Robin Jia
Kyle Mahowald
Dan Jurafsky
230
115
0
13 Oct 2020
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
254
3,684
0
28 Feb 2017
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents
Ramesh Nallapati
Feifei Zhai
Bowen Zhou
207
1,255
0
14 Nov 2016
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
175
3,510
0
10 Jun 2015
Previous
1
2