Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.14520
Cited By
Large Language Models Are State-of-the-Art Evaluators of Translation Quality
28 February 2023
Tom Kocmi
C. Federmann
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Language Models Are State-of-the-Art Evaluators of Translation Quality"
50 / 229 papers shown
Title
Trained MT Metrics Learn to Cope with Machine-translated References
Jannis Vamvas
Tobias Domhan
Sony Trenous
Rico Sennrich
Eva Hasler
24
1
0
01 Dec 2023
Mark My Words: Analyzing and Evaluating Language Model Watermarks
Julien Piet
Chawin Sitawarin
Vivian Fang
Norman Mu
David Wagner
WaLM
37
33
0
01 Dec 2023
Exploring Prompting Large Language Models as Explainable Metrics
Ghazaleh Mahmoudi
LRM
19
4
0
20 Nov 2023
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
Jon Saad-Falcon
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
27
105
0
16 Nov 2023
Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations
Zilu Tang
Mayank Agarwal
Alex Shypula
Bailin Wang
Derry Wijaya
Jie Chen
Yoon Kim
LRM
37
15
0
13 Nov 2023
Word Definitions from Large Language Models
Yunting Yin
Steven Skiena
Samuel Kim
Yunting Yin
Steven Skiena
AILaw
35
0
0
10 Nov 2023
Which is better? Exploring Prompting Strategy For LLM-based Metrics
Joonghoon Kim
Saeran Park
Kiyoon Jeong
Sangmin Lee
S. Han
Jiyoon Lee
Pilsung Kang
11
15
0
07 Nov 2023
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task
Neema Kotonya
Saran Krishnasamy
Joel R. Tetreault
Alejandro Jaimes
24
9
0
01 Nov 2023
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Tian Liang
Zhiwei He
Jen-tse Huang
Wenxuan Wang
Wenxiang Jiao
Rui Wang
Yujiu Yang
Zhaopeng Tu
Shuming Shi
Xing Wang
LLMAG
60
5
0
31 Oct 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Christoph Leiter
Juri Opitz
Daniel Deutsch
Yang Gao
Rotem Dror
Steffen Eger
ALM
LRM
ELM
40
31
0
30 Oct 2023
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation
Minzhi Li
Taiwei Shi
Caleb Ziems
Min-Yen Kan
Nancy F. Chen
Zhengyuan Liu
Diyi Yang
29
68
0
24 Oct 2023
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Da Song
Xuan Xie
Jiayang Song
Derui Zhu
Yuheng Huang
Felix Juefei Xu
Lei Ma
ALM
35
3
0
22 Oct 2023
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications
Manuel Faysse
Gautier Viaud
C´eline Hudelot
Pierre Colombo
32
9
0
21 Oct 2023
GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4
Tom Kocmi
C. Federmann
30
74
0
21 Oct 2023
xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection
Nuno M. Guerreiro
Ricardo Rei
Daan van Stigt
Luísa Coheur
Pierre Colombo
André F.T. Martins
48
112
0
16 Oct 2023
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Peng Li
Yeye He
Dror Yashar
Weiwei Cui
Song Ge
Haidong Zhang
D. Fainman
Dongmei Zhang
Surajit Chaudhuri
ALM
LMTD
50
70
0
13 Oct 2023
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques
Junxiao Shen
John J. Dudley
Jingyao Zheng
Bill Byrne
Per Ola Kristensson
32
3
0
12 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
46
167
0
11 Oct 2023
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
Weixiang Yan
Yuchen Tian
Yunzhe Li
Qian Chen
Wen Wang
34
35
0
08 Oct 2023
Learning Personalized Alignment for Evaluating Open-ended Text Generation
Danqing Wang
Kevin Kaichuang Yang
Hanlin Zhu
Xiaomeng Yang
Andrew Cohen
Lei Li
Yuandong Tian
ALM
LM&MA
17
8
0
05 Oct 2023
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
30
13
0
29 Sep 2023
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Ryan Koo
Minhwa Lee
Vipul Raheja
Jong Inn Park
Zae Myung Kim
Dongyeop Kang
ALM
43
75
0
29 Sep 2023
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
49
31
0
23 Sep 2023
Frustrated with Code Quality Issues? LLMs can Help!
Nalin Wadhwa
Jui Pradhan
Atharv Sonwane
Surya Prakash Sahu
Nagarajan Natarajan
Aditya Kanade
Suresh Parthasarathy
S. Rajamani
35
2
0
22 Sep 2023
Automatic Answerability Evaluation for Question Generation
Zifan Wang
Kotaro Funakoshi
Manabu Okumura
34
2
0
22 Sep 2023
Towards Effective Disambiguation for Machine Translation with Large Language Models
Vivek Iyer
Pinzhen Chen
Alexandra Birch
19
11
0
20 Sep 2023
Summarization is (Almost) Dead
Xiao Pu
Mingqi Gao
Xiaojun Wan
HILM
81
39
0
18 Sep 2023
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Rishav Hada
Varun Gumma
Adrian de Wynter
Harshita Diddee
Mohamed Ahmed
Monojit Choudhury
Kalika Bali
Sunayana Sitaram
ALM
LM&MA
ELM
35
63
0
14 Sep 2023
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Wei Qi Leong
Jian Gang Ngui
Yosephine Susanto
Hamsawardhini Rengarajan
Kengatharaiyer Sarveswaran
William-Chandra Tjhi
29
9
0
12 Sep 2023
Automating Behavioral Testing in Machine Translation
Javier Ferrando
Matthias Sperber
Hendra Setiawan
Dominic Telaar
Savsa Hasan
27
2
0
05 Sep 2023
Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
Daniel Deutsch
Juraj Juraska
M. Finkelstein
and Markus Freitag
46
11
0
25 Aug 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Min Zhang
ELM
LM&MA
36
101
0
24 Aug 2023
Using ChatGPT as a CAT tool in Easy Language translation
Silvana Deilen
Sergio Hernández Garrido
Ekaterina Lapshinova-Koltunski
Christiane Maaß
18
10
0
22 Aug 2023
Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies
Yilun Liu
Shimin Tao
Weibin Meng
Jingyu Wang
Wenbing Ma
Yanqing Zhao
Yuhang Chen
Hao Yang
Yanfei Jiang
Xun Chen
45
24
0
15 Aug 2023
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Patrick Fernandes
Daniel Deutsch
M. Finkelstein
Parker Riley
André F. T. Martins
Graham Neubig
Ankush Garg
J. Clark
Markus Freitag
Orhan Firat
LRM
36
68
0
14 Aug 2023
Three Ways of Using Large Language Models to Evaluate Chat
Ondvrej Plátek
Vojtvech Hudevcek
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
ALM
19
6
0
12 Aug 2023
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text
Tahsina Hashem
Weiqing Wang
Derry Wijaya
Mohammed Eunus Ali
Yuan-Fang Li
29
3
0
12 Aug 2023
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment
Ying Zhao
Yu Bowen
Binyuan Hui
Haiyang Yu
Fei Huang
Yongbin Li
N. Zhang
44
23
0
10 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
46
3
0
08 Aug 2023
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation
Xianfeng Zeng
Yanjun Liu
Fandong Meng
Jie Zhou
26
0
0
06 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
61
82
0
03 Aug 2023
CHATREPORT: Democratizing Sustainability Disclosure Analysis through LLM-based Tools
Jingwei Ni
J. Bingler
Chiara Colesanti-Senni
Mathias Kraus
Glen Gostlow
...
Qian Wang
Nicolas Webersinke
Tobias Wekhof
Ting Yu
Markus Leippold
37
29
0
28 Jul 2023
ARB: Advanced Reasoning Benchmark for Large Language Models
Tomohiro Sawada
Daniel Paleka
Alexander Havrilla
Pranav Tadepalli
Paula Vidas
Alexander Kranias
John J. Nay
Kshitij Gupta
Aran Komatsuzaki
ELM
LRM
45
37
0
25 Jul 2023
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models
Adian Liusie
Potsawee Manakul
Mark J. F. Gales
ELM
29
35
0
15 Jul 2023
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
30
13
0
22 Jun 2023
Knowledge-Prompted Estimator: A Novel Approach to Explainable Machine Translation Assessment
Hao Yang
Min Zhang
Shimin Tao
Minghan Wang
Daimeng Wei
Yanfei Jiang
LRM
15
10
0
13 Jun 2023
Iterative Translation Refinement with Large Language Models
Pinzhen Chen
Zhicheng Guo
Barry Haddow
Kenneth Heafield
LRM
17
22
0
06 Jun 2023
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
Tianyi Tang
Hongyuan Lu
Yuchen Eleanor Jiang
Haoyang Huang
Dongdong Zhang
Wayne Xin Zhao
Tom Kocmi
Furu Wei
20
5
0
24 May 2023
Using Natural Language Explanations to Rescale Human Judgments
Manya Wadhwa
Jifan Chen
Junyi Jessy Li
Greg Durrett
43
8
0
24 May 2023
Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration
Daniel Deutsch
George F. Foster
Markus Freitag
21
42
0
23 May 2023
Previous
1
2
3
4
5
Next