Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.02202
Cited By
The price of debiasing automatic metrics in natural language evaluation
6 July 2018
Arun Tejasvi Chaganty
Stephen Mussmann
Percy Liang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The price of debiasing automatic metrics in natural language evaluation"
26 / 26 papers shown
Title
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
50
6
0
17 Oct 2024
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Adam Fisch
Joshua Maynez
R. A. Hofer
Bhuwan Dhingra
Amir Globerson
William W. Cohen
44
8
0
06 Jun 2024
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Chenyu You
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
24
133
0
15 Dec 2022
How Far are We from Robust Long Abstractive Summarization?
Huan Yee Koh
Jiaxin Ju
He Zhang
Ming Liu
Shirui Pan
HILM
28
39
0
30 Oct 2022
On the Effectiveness of Automated Metrics for Text Generation Systems
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
21
3
0
24 Oct 2022
Searching for a higher power in the human evaluation of MT
Johnny Tian-Zheng Wei
Tom Kocmi
C. Federmann
18
6
0
20 Oct 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
Huan Yee Koh
Jiaxin Ju
Ming Liu
Shirui Pan
81
122
0
03 Jul 2022
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
59
14
0
11 Jun 2022
Computational Storytelling and Emotions: A Survey
Yusuke Mori
Hiroaki Yamane
Yusuke Mukuta
Tatsuya Harada
40
2
0
23 May 2022
Toward More Effective Human Evaluation for Machine Translation
Belén Saldías
George F. Foster
Markus Freitag
Qijun Tan
13
10
0
11 Apr 2022
Dynamic Human Evaluation for Relative Model Comparisons
Thórhildur Thorleiksdóttir
Cédric Renggli
Nora Hollenstein
Ce Zhang
36
2
0
15 Dec 2021
Context Matters in Semantically Controlled Language Generation for Task-oriented Dialogue Systems
Ye Liu
Wolfgang Maier
Wolfgang Minker
Stefan Ultes
16
4
0
28 Nov 2021
Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Yaman Kumar Singla
Sriram Krishna
R. Shah
Changyou Chen
18
6
0
17 Nov 2021
Cross-Lingual Open-Domain Question Answering with Answer Sentence Generation
Benjamin Muller
Luca Soldaini
Rik Koncel-Kedziorski
Eric Lind
Alessandro Moschitti
LRM
26
7
0
14 Oct 2021
Can Audio Captions Be Evaluated with Image Caption Metrics?
Zelin Zhou
Zhiling Zhang
Xuenan Xu
Zeyu Xie
Mengyue Wu
Kenny Q. Zhu
30
42
0
10 Oct 2021
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
31
1,984
0
02 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
33
230
0
27 Aug 2020
SummEval: Re-evaluating Summarization Evaluation
Alexander R. Fabbri
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
Dragomir R. Radev
HILM
38
689
0
24 Jul 2020
SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
Daniel Deutsch
Dan Roth
16
26
0
10 Jul 2020
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization
Esin Durmus
He He
Mona T. Diab
HILM
14
384
0
07 May 2020
Exploring Content Selection in Summarization of Novel Chapters
Faisal Ladhak
Bryan Li
Yaser Al-Onaizan
Kathleen McKeown
69
35
0
04 May 2020
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
46
1,446
0
09 Apr 2020
Towards a Human-like Open-Domain Chatbot
Daniel De Freitas
Minh-Thang Luong
David R. So
Jamie Hall
Noah Fiedel
...
Zi Yang
Apoorv Kulshreshtha
Gaurav Nemade
Yifeng Lu
Quoc V. Le
42
924
0
27 Jan 2020
Handling Divergent Reference Texts when Evaluating Table-to-Text Generation
Bhuwan Dhingra
Manaal Faruqui
Ankur P. Parikh
Ming-Wei Chang
Dipanjan Das
William W. Cohen
27
193
0
03 Jun 2019
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
44
866
0
27 Nov 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1