Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

11 June 2020

Papers citing "Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics"

17 / 67 papers shown

Title
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation Swaroop Mishra Anjana Arunkumar 34 24 0 10 Jun 2021
Evaluating the Efficacy of Summarization Evaluation across Languages Fajri Koto Jey Han Lau Timothy Baldwin 52 19 0 02 Jun 2021
Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort Vania Mendoncca Ricardo Rei Luísa Coheur Alberto Sardinha Ana Lúcia Santos INESC-ID Lisboa 20 6 0 27 May 2021
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics Jian Guan Zhexin Zhang Zhuoer Feng Zitao Liu Wenbiao Ding Xiaoxi Mao Changjie Fan Minlie Huang 20 61 0 19 May 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark Nouha Dziri Hannah Rashkin Tal Linzen David Reitter ALM 208 79 0 30 Apr 2021
Reward Optimization for Neural Machine Translation with Learned Metrics Raphael Shu Kang Min Yoo Jung-Woo Ha 44 12 0 15 Apr 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Sebastian Gehrmann Tosin Adewumi Karmanya Aggarwal Pawan Sasanka Ammanamanchi Aremu Anuoluwapo ... Nishant Subramani Wei Xu Diyi Yang Akhila Yerukola Jiawei Zhou VLM 260 285 0 02 Feb 2021
SemMT: A Semantic-based Testing Approach for Machine Translation Systems Jialun Cao Meiziniu Li Yeting Li Ming Wen Haiming Chen 41 33 0 03 Dec 2020
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems Craig Thomson Ehud Reiter 22 52 0 08 Nov 2020
Unbabel's Participation in the WMT20 Metrics Shared Task Ricardo Rei Craig Alan Stewart Catarina Farinha A. Lavie 12 79 0 29 Oct 2020
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale Ozan Caglayan Pranava Madhyastha Lucia Specia ELM 39 35 0 26 Oct 2020
A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images Pablo Messina Pablo Pino Denis Parra Alvaro Soto Cecilia Besa S. Uribe Marcelo andía C. Tejos Claudia Prieto Daniel Capurro MedIm 36 62 0 20 Oct 2020
KoBE: Knowledge-Based Machine Translation Evaluation Zorik Gekhman Roee Aharoni Genady Beryozkin Markus Freitag Wolfgang Macherey 25 15 0 23 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems Ananya B. Sai Akash Kumar Mohankumar Mitesh M. Khapra ELM 40 230 0 27 Aug 2020
Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR Juri Opitz Anette Frank 34 35 0 20 Aug 2020
Evaluation of Text Generation: A Survey Asli Celikyilmaz Elizabeth Clark Jianfeng Gao ELM LM&MA 44 378 0 26 Jun 2020
Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation Nikolay Bogoychev Rico Sennrich 16 50 0 06 Nov 2019