Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.14478
Cited By
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation
29 April 2021
Markus Freitag
George F. Foster
David Grangier
Viresh Ratnakar
Qijun Tan
Wolfgang Macherey
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation"
44 / 94 papers shown
Title
Reward Gaming in Conditional Text Generation
Richard Yuanzhe Pang
Vishakh Padmakumar
Thibault Sellam
Ankur P. Parikh
He He
35
24
0
16 Nov 2022
HilMeMe: A Human-in-the-Loop Machine Translation Evaluation Metric Looking into Multi-Word Expressions
Lifeng Han
12
2
0
09 Nov 2022
Dialect-robust Evaluation of Generated Text
Jiao Sun
Thibault Sellam
Elizabeth Clark
Tu Vu
Timothy Dozat
Dan Garrette
Aditya Siddhant
Jacob Eisenstein
Sebastian Gehrmann
30
19
0
02 Nov 2022
ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics
Chantal Amrhein
Nikita Moghe
Liane Guillou
ELM
39
22
0
27 Oct 2022
On the Effectiveness of Automated Metrics for Text Generation Systems
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
33
3
0
24 Oct 2022
Searching for a higher power in the human evaluation of MT
Johnny Tian-Zheng Wei
Tom Kocmi
C. Federmann
20
6
0
20 Oct 2022
Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task
Keqin Bao
Boyi Deng
Dayiheng Liu
Baosong Yang
Wenqiang Lei
Xiangnan He
Derek F.Wong
Jun Xie
17
5
0
18 Oct 2022
Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task
Boyi Deng
Keqin Bao
Dayiheng Liu
Baosong Yang
Derek F. Wong
Lidia S. Chao
Wenqiang Lei
Jun Xie
32
9
0
18 Oct 2022
DICTDIS: Dictionary Constrained Disambiguation for Improved NMT
Ayush Maheshwari
Piyush Sharma
Preethi Jyothi
Ganesh Ramakrishnan
36
3
0
13 Oct 2022
Toxicity in Multilingual Machine Translation at Scale
Marta R. Costa-jussá
Eric Michael Smith
C. Ropers
Daniel Licht
Jean Maillard
Javier Ferrando
Carlos Escolano
30
25
0
06 Oct 2022
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task
Ricardo Rei
Marcos Vinícius Treviso
Nuno M. Guerreiro
Chrysoula Zerva
Ana C. Farinha
...
T. Glushkova
Duarte M. Alves
A. Lavie
Luísa Coheur
André F. T. Martins
63
144
0
13 Sep 2022
Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement
Zhen Yang
Fandong Meng
Yuanmeng Yan
Jie Zhou
39
3
0
13 Sep 2022
Automatic Correction of Human Translations
Jessy Lin
G. Kovács
Aditya Shastry
Joern Wuebker
John DeNero
39
3
0
17 Jun 2022
Resolving the Human Subjects Status of Machine Learning's Crowdworkers
Divyansh Kaushik
Zachary Chase Lipton
A. London
25
2
0
08 Jun 2022
Twist Decoding: Diverse Generators Guide Each Other
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
27
4
0
19 May 2022
Consistent Human Evaluation of Machine Translation across Language Pairs
Daniel Licht
Cynthia Gao
Janice Lam
Francisco Guzman
Mona T. Diab
Philipp Koehn
40
17
0
17 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
41
5
0
13 May 2022
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhehuai Chen
Yonghui Wu
Macduff Hughes
56
98
0
09 May 2022
Quality-Aware Decoding for Neural Machine Translation
Patrick Fernandes
António Farinhas
Ricardo Rei
José G. C. de Souza
Perez Ogayo
Graham Neubig
André F. T. Martins
43
57
0
02 May 2022
The Cross-lingual Conversation Summarization Challenge
Yulong Chen
Ming Zhong
Xuefeng Bai
Naihao Deng
Jing Li
Xianchao Zhu
Yue Zhang
22
9
0
01 May 2022
Toward More Effective Human Evaluation for Machine Translation
Belén Saldías
George F. Foster
Markus Freitag
Qijun Tan
25
10
0
11 Apr 2022
On Decoding Strategies for Neural Text Generators
Gian Wiher
Clara Meister
Ryan Cotterell
30
65
0
29 Mar 2022
Towards Explainable Evaluation Metrics for Natural Language Generation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei Zhao
Yang Gao
Steffen Eger
AAML
ELM
30
20
0
21 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
34
21
0
18 Mar 2022
Onception: Active Learning with Expert Advice for Real World Machine Translation
Vania Mendoncca
Ricardo Rei
Luísa Coheur
Alberto Sardinha INESC-ID Lisboa
38
6
0
09 Mar 2022
As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive Conditioning
Jannis Vamvas
Rico Sennrich
30
19
0
03 Mar 2022
UDAAN: Machine Learning based Post-Editing tool for Document Translation
Ayush Maheshwari
A. Ravindran
Venkatapathy Subramanian
Ganesh Ramakrishnan
26
7
0
03 Mar 2022
Data Scaling Laws in NMT: The Effect of Noise and Architecture
Yamini Bansal
Behrooz Ghorbani
Ankush Garg
Biao Zhang
M. Krikun
Colin Cherry
Behnam Neyshabur
Orhan Firat
42
47
0
04 Feb 2022
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
Wei Zhao
Michael Strube
Steffen Eger
27
37
0
26 Jan 2022
High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics
Markus Freitag
David Grangier
Qijun Tan
Bowen Liang
33
92
0
17 Nov 2021
Transparent Human Evaluation for Image Captioning
Jungo Kasai
Keisuke Sakaguchi
Lavinia Dunagan
Jacob Morrison
Ronan Le Bras
Yejin Choi
Noah A. Smith
33
47
0
17 Nov 2021
Learning Compact Metrics for MT
Amy Pu
Hyung Won Chung
Ankur P. Parikh
Sebastian Gehrmann
Thibault Sellam
35
99
0
12 Oct 2021
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results
M. Fomicheva
Piyawat Lertvittayakumjorn
Wei Zhao
Steffen Eger
Yang Gao
ELM
24
39
0
08 Oct 2021
Non-Parametric Online Learning from Human Feedback for Neural Machine Translation
Dongqi Wang
Hao-Ran Wei
Zhirui Zhang
Shujian Huang
Jun Xie
Jiajun Chen
OffRL
57
15
0
23 Sep 2021
Multilingual Document-Level Translation Enables Zero-Shot Transfer From Sentences to Documents
Biao Zhang
Ankur Bapna
Melvin Johnson
A. Dabirmoghaddam
N. Arivazhagan
Orhan Firat
34
12
0
21 Sep 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
116
58
0
13 Sep 2021
On the Challenges of Evaluating Compositional Explanations in Multi-Hop Inference: Relevance, Completeness, and Expert Ratings
Peter Alexander Jansen
Kelly Smith
Dan Moreno
Huitzilin Ortiz
CoGe
ReLM
LRM
33
10
0
07 Sep 2021
Survey of Low-Resource Machine Translation
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
39
150
0
01 Sep 2021
Translation Error Detection as Rationale Extraction
M. Fomicheva
Lucia Specia
Nikolaos Aletras
21
23
0
27 Aug 2021
Underreporting of errors in NLG output, and what to do about it
Emiel van Miltenburg
Miruna Clinciu
Ondrej Dusek
Dimitra Gkatzia
Stephanie Inglis
...
Saad Mahamood
Emma Manning
S. Schoch
Craig Thomson
Luou Wen
27
38
0
02 Aug 2021
To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation
Tom Kocmi
C. Federmann
Roman Grundkiewicz
Marcin Junczys-Dowmunt
Hitokazu Matsushita
Arul Menezes
31
204
0
22 Jul 2021
Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort
Vania Mendoncca
Ricardo Rei
Luísa Coheur
Alberto Sardinha
Ana Lúcia Santos INESC-ID Lisboa
20
6
0
27 May 2021
RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling
Yizhe Zhang
Siqi Sun
Xiang Gao
Yuwei Fang
Chris Brockett
Michel Galley
Jianfeng Gao
Bill Dolan
RALM
38
30
0
14 May 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
206
79
0
30 Apr 2021
Previous
1
2