ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.06264
  4. Cited By
Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine
  Translation Evaluation Metrics

Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

11 June 2020
Nitika Mathur
Tim Baldwin
Trevor Cohn
ArXivPDFHTML

Papers citing "Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics"

17 / 67 papers shown
Title
How Robust are Model Rankings: A Leaderboard Customization Approach for
  Equitable Evaluation
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Swaroop Mishra
Anjana Arunkumar
34
24
0
10 Jun 2021
Evaluating the Efficacy of Summarization Evaluation across Languages
Evaluating the Efficacy of Summarization Evaluation across Languages
Fajri Koto
Jey Han Lau
Timothy Baldwin
52
19
0
02 Jun 2021
Online Learning Meets Machine Translation Evaluation: Finding the Best
  Systems with the Least Human Effort
Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort
Vania Mendoncca
Ricardo Rei
Luísa Coheur
Alberto Sardinha
Ana Lúcia Santos INESC-ID Lisboa
20
6
0
27 May 2021
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
Jian Guan
Zhexin Zhang
Zhuoer Feng
Zitao Liu
Wenbiao Ding
Xiaoxi Mao
Changjie Fan
Minlie Huang
20
61
0
19 May 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
208
79
0
30 Apr 2021
Reward Optimization for Neural Machine Translation with Learned Metrics
Reward Optimization for Neural Machine Translation with Learned Metrics
Raphael Shu
Kang Min Yoo
Jung-Woo Ha
44
12
0
15 Apr 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
260
285
0
02 Feb 2021
SemMT: A Semantic-based Testing Approach for Machine Translation Systems
SemMT: A Semantic-based Testing Approach for Machine Translation Systems
Jialun Cao
Meiziniu Li
Yeting Li
Ming Wen
Haiming Chen
41
33
0
03 Dec 2020
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text
  Systems
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
Craig Thomson
Ehud Reiter
22
52
0
08 Nov 2020
Unbabel's Participation in the WMT20 Metrics Shared Task
Unbabel's Participation in the WMT20 Metrics Shared Task
Ricardo Rei
Craig Alan Stewart
Catarina Farinha
A. Lavie
12
79
0
29 Oct 2020
Curious Case of Language Generation Evaluation Metrics: A Cautionary
  Tale
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
ELM
39
35
0
26 Oct 2020
A Survey on Deep Learning and Explainability for Automatic Report
  Generation from Medical Images
A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images
Pablo Messina
Pablo Pino
Denis Parra
Alvaro Soto
Cecilia Besa
S. Uribe
Marcelo andía
C. Tejos
Claudia Prieto
Daniel Capurro
MedIm
36
62
0
20 Oct 2020
KoBE: Knowledge-Based Machine Translation Evaluation
KoBE: Knowledge-Based Machine Translation Evaluation
Zorik Gekhman
Roee Aharoni
Genady Beryozkin
Markus Freitag
Wolfgang Macherey
25
15
0
23 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
40
230
0
27 Aug 2020
Towards a Decomposable Metric for Explainable Evaluation of Text
  Generation from AMR
Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR
Juri Opitz
Anette Frank
34
35
0
20 Aug 2020
Evaluation of Text Generation: A Survey
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
44
378
0
26 Jun 2020
Domain, Translationese and Noise in Synthetic Data for Neural Machine
  Translation
Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation
Nikolay Bogoychev
Rico Sennrich
16
50
0
06 Nov 2019
Previous
12