Better than Random: Reliable NLG Human Evaluation with Constrained
Active Sampling

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

12 June 2024

Xiaojun Wan

Papers citing "Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling"

9 / 9 papers shown

Title
We need to talk about random seeds Steven Bethard 41 8 0 24 Oct 2022
Transparent Human Evaluation for Image Captioning Jungo Kasai Keisuke Sakaguchi Lavinia Dunagan Jacob Morrison Ronan Le Bras Yejin Choi Noah A. Smith 43 49 0 17 Nov 2021
BARTScore: Evaluating Generated Text as Text Generation Weizhe Yuan Graham Neubig Pengfei Liu 84 829 0 22 Jun 2021
Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort Vania Mendoncca Ricardo Rei Luísa Coheur Alberto Sardinha Ana Lúcia Santos INESC-ID Lisboa 29 6 0 27 May 2021
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics Jian Guan Zhexin Zhang Zhuoer Feng Zitao Liu Wenbiao Ding Xiaoxi Mao Changjie Fan Minlie Huang 40 61 0 19 May 2021
Re-evaluating Evaluation in Text Summarization Manik Bhandari Pranav Narayan Gour A. Ashfaq Pengfei Liu Graham Neubig 83 175 0 14 Oct 2020
Unifying Human and Statistical Evaluation for Natural Language Generation Tatsunori B. Hashimoto Hugh Zhang Percy Liang 49 223 0 04 Apr 2019
Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies Max Grusky Mor Naaman Yoav Artzi 74 550 0 30 Apr 2018
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation Albert Gatt E. Krahmer LM&MA ELM 60 814 0 29 Mar 2017