ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.06935
  4. Cited By
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

14 February 2022
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
    ELM
    AI4CE
ArXivPDFHTML

Papers citing "Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text"

18 / 68 papers shown
Title
A Framework for Deprecating Datasets: Standardizing Documentation,
  Identification, and Communication
A Framework for Deprecating Datasets: Standardizing Documentation, Identification, and Communication
A. Luccioni
Frances Corry
H. Sridharan
Mike Ananny
J. Schultz
Kate Crawford
54
29
0
18 Oct 2021
Finding a Balanced Degree of Automation for Summary Evaluation
Finding a Balanced Degree of Automation for Summary Evaluation
Shiyue Zhang
Joey Tianyi Zhou
52
43
0
23 Sep 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text
  Generation
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
220
106
0
14 Sep 2021
Just What do You Think You're Doing, Dave?' A Checklist for Responsible
  Data Use in NLP
Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP
Anna Rogers
Timothy Baldwin
Kobi Leins
104
64
0
14 Sep 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
113
57
0
13 Sep 2021
Does It Capture STEL? A Modular, Similarity-based Linguistic Style
  Evaluation Framework
Does It Capture STEL? A Modular, Similarity-based Linguistic Style Evaluation Framework
Anna Wegmann
D. Nguyen
24
12
0
10 Sep 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
242
593
0
14 Jul 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
192
79
0
30 Apr 2021
Understanding Factuality in Abstractive Summarization with FRANK: A
  Benchmark for Factuality Metrics
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
231
305
0
27 Apr 2021
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing
Boaz Shmueli
Jan Fell
Soumya Ray
Lun-Wei Ku
108
86
0
20 Apr 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
254
285
0
02 Feb 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Joey Tianyi Zhou
Christopher Ré
AAML
OffRL
OOD
154
136
0
13 Jan 2021
GO FIGURE: A Meta Evaluation of Factuality in Summarization
GO FIGURE: A Meta Evaluation of Factuality in Summarization
Saadia Gabriel
Asli Celikyilmaz
Rahul Jha
Yejin Choi
Jianfeng Gao
HILM
238
96
0
24 Oct 2020
Factual Error Correction for Abstractive Summarization Models
Factual Error Correction for Abstractive Summarization Models
Mengyao Cao
Yue Dong
Jiapeng Wu
Jackie C.K. Cheung
HILM
KELM
169
159
0
17 Oct 2020
With Little Power Comes Great Responsibility
With Little Power Comes Great Responsibility
Dallas Card
Peter Henderson
Urvashi Khandelwal
Robin Jia
Kyle Mahowald
Dan Jurafsky
230
115
0
13 Oct 2020
Towards A Rigorous Science of Interpretable Machine Learning
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
254
3,684
0
28 Feb 2017
SummaRuNNer: A Recurrent Neural Network based Sequence Model for
  Extractive Summarization of Documents
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents
Ramesh Nallapati
Feifei Zhai
Bowen Zhou
207
1,255
0
14 Nov 2016
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
175
3,510
0
10 Jun 2015
Previous
12