ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.06063
  4. Cited By
BLEU might be Guilty but References are not Innocent

BLEU might be Guilty but References are not Innocent

13 April 2020
Markus Freitag
David Grangier
Isaac Caswell
ArXivPDFHTML

Papers citing "BLEU might be Guilty but References are not Innocent"

50 / 53 papers shown
Title
TRAIL: Trace Reasoning and Agentic Issue Localization
TRAIL: Trace Reasoning and Agentic Issue Localization
Darshan Deshpande
Varun Gangal
Hersh Mehta
Jitin Krishnan
Anand Kannappan
Rebecca Qian
54
0
0
13 May 2025
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
44
5
0
07 Oct 2024
AI-Assisted Human Evaluation of Machine Translation
AI-Assisted Human Evaluation of Machine Translation
Vilém Zouhar
Tom Kocmi
Mrinmaya Sachan
57
5
0
18 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
114
35
0
09 Jun 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
119
24
0
20 May 2024
Natural Language Processing RELIES on Linguistics
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
62
7
0
09 May 2024
DOCCI: Descriptions of Connected and Contrasting Images
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe
Sunayana Rane
Zachary Berger
Yonatan Bitton
Jaemin Cho
...
Zarana Parekh
Jordi Pont-Tuset
Garrett Tanzer
Su Wang
Jason Baldridge
46
52
0
30 Apr 2024
MT-Ranker: Reference-free machine translation evaluation by inter-system
  ranking
MT-Ranker: Reference-free machine translation evaluation by inter-system ranking
Ibraheem Muhammad Moosa
Rui Zhang
Wenpeng Yin
40
5
0
30 Jan 2024
Aligning Translation-Specific Understanding to General Understanding in
  Large Language Models
Aligning Translation-Specific Understanding to General Understanding in Large Language Models
Yi-Chong Huang
Xiaocheng Feng
Baohang Li
Chengpeng Fu
Wenshuai Huo
Ting Liu
Bing Qin
35
0
0
10 Jan 2024
Quality-Aware Translation Models: Efficient Generation and Quality
  Estimation in a Single Model
Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model
Christian Tomani
David Vilar
Markus Freitag
Colin Cherry
Subhajit Naskar
Mara Finkelstein
Xavier Garcia
Daniel Cremers
33
7
0
10 Oct 2023
Improving Language Model Integration for Neural Machine Translation
Improving Language Model Integration for Neural Machine Translation
Christian Herold
Yingbo Gao
Mohammad Zeineldeen
Hermann Ney
34
2
0
08 Jun 2023
Breeding Machine Translations: Evolutionary approach to survive and
  thrive in the world of automated evaluation
Breeding Machine Translations: Evolutionary approach to survive and thrive in the world of automated evaluation
Josef Jon
Ondrej Bojar
40
10
0
30 May 2023
Improving Metrics for Speech Translation
Improving Metrics for Speech Translation
Claudio Paonessa
Dominik Frefel
Manfred Vogel
39
1
0
22 May 2023
How Good are Commercial Large Language Models on African Languages?
How Good are Commercial Large Language Models on African Languages?
Jessica Ojo
Kelechi Ogueji
40
5
0
11 May 2023
Binarized Neural Machine Translation
Binarized Neural Machine Translation
Yichi Zhang
Ankush Garg
Yuan Cao
Lukasz Lew
Behrooz Ghorbani
Zhiru Zhang
Orhan Firat
MQ
41
14
0
09 Feb 2023
Benchmarking Large Language Models for News Summarization
Benchmarking Large Language Models for News Summarization
Tianyi Zhang
Faisal Ladhak
Esin Durmus
Percy Liang
Kathleen McKeown
Tatsunori B. Hashimoto
ELM
54
497
0
31 Jan 2023
Extrinsic Evaluation of Machine Translation Metrics
Extrinsic Evaluation of Machine Translation Metrics
Nikita Moghe
Tom Sherborne
Mark Steedman
Alexandra Birch
ELM
38
18
0
20 Dec 2022
Toward Human-Like Evaluation for Natural Language Generation with Error
  Analysis
Toward Human-Like Evaluation for Natural Language Generation with Error Analysis
Qingyu Lu
Liang Ding
Liping Xie
Kanjian Zhang
Derek F. Wong
Dacheng Tao
ELM
ALM
58
14
0
20 Dec 2022
Understanding Translationese in Cross-Lingual Summarization
Understanding Translationese in Cross-Lingual Summarization
Jiaan Wang
Fandong Meng
Yunlong Liang
Tingyi Zhang
Jiarong Xu
Zhixu Li
Jie Zhou
46
15
0
14 Dec 2022
Competency-Aware Neural Machine Translation: Can Machine Translation
  Know its Own Translation Quality?
Competency-Aware Neural Machine Translation: Can Machine Translation Know its Own Translation Quality?
Pei Zhang
Baosong Yang
Hao-Ran Wei
Dayiheng Liu
Kai Fan
Luo Si
Jun Xie
26
3
0
25 Nov 2022
Conciseness: An Overlooked Language Task
Conciseness: An Overlooked Language Task
Felix Stahlberg
Aashish Kumar
Chris Alberti
Shankar Kumar
26
1
0
08 Nov 2022
DEMETR: Diagnosing Evaluation Metrics for Translation
DEMETR: Diagnosing Evaluation Metrics for Translation
Marzena Karpinska
N. Raj
Katherine Thai
Yixiao Song
Ankita Gupta
Mohit Iyyer
34
38
0
25 Oct 2022
Rethinking Round-Trip Translation for Machine Translation Evaluation
Rethinking Round-Trip Translation for Machine Translation Evaluation
Terry Yue Zhuo
Xingliang Yuan
Xuanli He
Trevor Cohn
LRM
29
2
0
15 Sep 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
101
75
0
25 May 2022
Lack of Fluency is Hurting Your Translation Model
Lack of Fluency is Hurting Your Translation Model
J. Yoo
Jaewoo Kang
46
0
0
24 May 2022
Twist Decoding: Diverse Generators Guide Each Other
Twist Decoding: Diverse Generators Guide Each Other
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
32
4
0
19 May 2022
Building Machine Translation Systems for the Next Thousand Languages
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhiwen Chen
Yonghui Wu
Macduff Hughes
56
99
0
09 May 2022
Original or Translated? A Causal Analysis of the Impact of
  Translationese on Machine Translation Performance
Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance
Jingwei Ni
Zhijing Jin
Markus Freitag
Mrinmaya Sachan
Bernhard Schölkopf
40
14
0
04 May 2022
UniTE: Unified Translation Evaluation
UniTE: Unified Translation Evaluation
Boyi Deng
Dayiheng Liu
Baosong Yang
Haibo Zhang
Boxing Chen
Derek F. Wong
Lidia S. Chao
41
41
0
28 Apr 2022
Towards Explainable Evaluation Metrics for Natural Language Generation
Towards Explainable Evaluation Metrics for Natural Language Generation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei Zhao
Yang Gao
Steffen Eger
AAML
ELM
43
20
0
21 Mar 2022
Bridging the Data Gap between Training and Inference for Unsupervised
  Neural Machine Translation
Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation
Zhiwei He
Xing Wang
Rui Wang
Shuming Shi
Zhaopeng Tu
44
12
0
16 Mar 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
89
185
0
14 Feb 2022
Data Scaling Laws in NMT: The Effect of Noise and Architecture
Data Scaling Laws in NMT: The Effect of Noise and Architecture
Yamini Bansal
Behrooz Ghorbani
Ankush Garg
Biao Zhang
M. Krikun
Colin Cherry
Behnam Neyshabur
Orhan Firat
47
47
0
04 Feb 2022
High Quality Rather than High Model Probability: Minimum Bayes Risk
  Decoding with Neural Metrics
High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics
Markus Freitag
David Grangier
Qijun Tan
Bowen Liang
38
92
0
17 Nov 2021
SynthBio: A Case Study in Human-AI Collaborative Curation of Text
  Datasets
SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets
Ann Yuan
Daphne Ippolito
Vitaly Nikolaev
Chris Callison-Burch
Andy Coenen
Sebastian Gehrmann
SyDa
117
20
0
11 Nov 2021
Better than Average: Paired Evaluation of NLP Systems
Better than Average: Paired Evaluation of NLP Systems
Maxime Peyrard
Wei Zhao
Steffen Eger
Robert West
ELM
36
24
0
20 Oct 2021
Control Prefixes for Parameter-Efficient Text Generation
Control Prefixes for Parameter-Efficient Text Generation
Jordan Clive
Kris Cao
Marek Rei
59
32
0
15 Oct 2021
Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation
  with Multi-Armed Bandits
Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits
Julia Kreutzer
David Vilar
Artem Sokolov
49
15
0
13 Oct 2021
Global Explainability of BERT-Based Evaluation Metrics by Disentangling
  along Linguistic Factors
Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors
Marvin Kaster
Wei Zhao
Steffen Eger
50
24
0
08 Oct 2021
Scaling Laws for Neural Machine Translation
Scaling Laws for Neural Machine Translation
Behrooz Ghorbani
Orhan Firat
Markus Freitag
Ankur Bapna
M. Krikun
Xavier Garcia
Ciprian Chelba
Colin Cherry
42
99
0
16 Sep 2021
To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for
  Machine Translation
To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation
Tom Kocmi
C. Federmann
Roman Grundkiewicz
Marcin Junczys-Dowmunt
Hitokazu Matsushita
Arul Menezes
51
204
0
22 Jul 2021
How well do you know your summarization datasets?
How well do you know your summarization datasets?
Priyam Tejaswin
Dhruv Naik
Peng Liu
42
26
0
21 Jun 2021
On the Language Coverage Bias for Neural Machine Translation
On the Language Coverage Bias for Neural Machine Translation
Shuo Wang
Zhaopeng Tu
Zhixing Tan
Shuming Shi
Maosong Sun
Yang Liu
22
19
0
07 Jun 2021
The statistical advantage of automatic NLG metrics at the system level
The statistical advantage of automatic NLG metrics at the system level
Johnny Tian-Zheng Wei
Robin Jia
28
23
0
26 May 2021
Focus Attention: Promoting Faithfulness and Diversity in Summarization
Focus Attention: Promoting Faithfulness and Diversity in Summarization
Rahul Aralikatte
Shashi Narayan
Joshua Maynez
S. Rothe
Ryan T. McDonald
40
45
0
25 May 2021
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation
  for Machine Translation
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation
Markus Freitag
George F. Foster
David Grangier
Viresh Ratnakar
Qijun Tan
Wolfgang Macherey
35
381
0
29 Apr 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
260
285
0
02 Feb 2021
Unbabel's Participation in the WMT20 Metrics Shared Task
Unbabel's Participation in the WMT20 Metrics Shared Task
Ricardo Rei
Craig Alan Stewart
Catarina Farinha
A. Lavie
18
79
0
29 Oct 2020
Reformulating Unsupervised Style Transfer as Paraphrase Generation
Reformulating Unsupervised Style Transfer as Paraphrase Generation
Kalpesh Krishna
John Wieting
Mohit Iyyer
35
238
0
12 Oct 2020
KoBE: Knowledge-Based Machine Translation Evaluation
KoBE: Knowledge-Based Machine Translation Evaluation
Zorik Gekhman
Roee Aharoni
Genady Beryozkin
Markus Freitag
Wolfgang Macherey
43
15
0
23 Sep 2020
12
Next