Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.07981
Cited By
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
15 December 2022
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
Ruilin Han
Simeng Han
Shafiq R. Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation"
18 / 118 papers shown
Title
WiCE: Real-World Entailment for Claims in Wikipedia
Ryo Kamoi
Tanya Goyal
Juan Diego Rodriguez
Greg Durrett
26
80
0
02 Mar 2023
GPTScore: Evaluate as You Desire
Jinlan Fu
See-Kiong Ng
Zhengbao Jiang
Pengfei Liu
LM&MA
ALM
ELM
15
264
0
08 Feb 2023
LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control
Yilun Zhao
Zhenting Qi
Linyong Nan
Lorenzo Jaime Yu Flores
Dragomir R. Radev
LMTD
13
18
0
06 Feb 2023
Benchmarking Large Language Models for News Summarization
Tianyi Zhang
Faisal Ladhak
Esin Durmus
Percy Liang
Kathleen McKeown
Tatsunori B. Hashimoto
ELM
23
478
0
31 Jan 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
14
89
0
30 Jan 2023
The Next Chapter: A Study of Large Language Models in Storytelling
Zhuohan Xie
Trevor Cohn
Jey Han Lau
28
42
0
24 Jan 2023
Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization
Artidoro Pagnoni
Alexander R. Fabbri
Wojciech Kry'sciñski
Chien-Sheng Wu
RALM
29
18
0
20 Dec 2022
On the Limitations of Reference-Free Evaluations of Generated Text
Daniel Deutsch
Rotem Dror
Dan Roth
34
45
0
22 Oct 2022
Marvista: Exploring the Design of a Human-AI Collaborative News Reading Tool
Xiang Ánthony' Chen
Chien-Sheng Wu
Lidiya Murakhovs'ka
Philippe Laban
Tong Niu
Wenhao Liu
Caiming Xiong
SyDa
16
9
0
18 Jul 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Lavinia Dunagan
Jacob Morrison
Alexander R. Fabbri
Yejin Choi
Noah A. Smith
51
39
0
08 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,657
0
15 Oct 2021
Finding a Balanced Degree of Automation for Summary Evaluation
Shiyue Zhang
Joey Tianyi Zhou
49
43
0
23 Sep 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
228
305
0
27 Apr 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
254
285
0
02 Feb 2021
CTRLsum: Towards Generic Controllable Text Summarization
Junxian He
Wojciech Kry'sciñski
Bryan McCann
Nazneen Rajani
Caiming Xiong
216
138
0
08 Dec 2020
GO FIGURE: A Meta Evaluation of Factuality in Summarization
Saadia Gabriel
Asli Celikyilmaz
Rahul Jha
Yejin Choi
Jianfeng Gao
HILM
238
96
0
24 Oct 2020
With Little Power Comes Great Responsibility
Dallas Card
Peter Henderson
Urvashi Khandelwal
Robin Jia
Kyle Mahowald
Dan Jurafsky
230
115
0
13 Oct 2020
Previous
1
2
3