ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.14520
  4. Cited By
Large Language Models Are State-of-the-Art Evaluators of Translation
  Quality

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

28 February 2023
Tom Kocmi
C. Federmann
    ELM
ArXivPDFHTML

Papers citing "Large Language Models Are State-of-the-Art Evaluators of Translation Quality"

50 / 229 papers shown
Title
AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions
  for Conversational Search with LLMs
AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs
Clemencia Siro
Yifei Yuan
Mohammad Aliannejadi
Maarten de Rijke
ELM
25
3
0
25 Oct 2024
From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense
  Assessment Items
From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items
Melissa Roemmele
Andrew S. Gordon
35
1
0
18 Oct 2024
HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World
  Multilingual Settings
HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World Multilingual Settings
Varun Gumma
Anandhita Raghunath
Mohit Jain
Sunayana Sitaram
LM&MA
34
1
0
17 Oct 2024
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Zonghai Yao
Aditya Parashar
Huixue Zhou
Won Seok Jang
Feiyun Ouyang
Zhichao Yang
Hong-ye Yu
ELM
53
2
0
17 Oct 2024
Data Processing for the OpenGPT-X Model Family
Data Processing for the OpenGPT-X Model Family
Nicolo' Brandizzi
Hammam Abdelwahab
Anirban Bhowmick
Lennard Helmer
Benny Jörg Stein
...
Georg Rehm
Dennis Wegener
Nicolas Flores-Herr
Joachim Kohler
Johannes Leveling
VLM
81
2
0
11 Oct 2024
Are Large Language Models State-of-the-art Quality Estimators for
  Machine Translation of User-generated Content?
Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?
Shenbin Qian
Constantin Orasan
Diptesh Kanojia
Félix do Carmo
ELM
27
0
0
08 Oct 2024
Fortify Your Foundations: Practical Privacy and Security for Foundation
  Model Deployments In The Cloud
Fortify Your Foundations: Practical Privacy and Security for Foundation Model Deployments In The Cloud
Marcin Chrapek
Anjo Vahldiek-Oberwagner
Marcin Spoczynski
Scott Constable
Mona Vij
Torsten Hoefler
37
1
0
08 Oct 2024
Language Model-Driven Data Pruning Enables Efficient Active Learning
Language Model-Driven Data Pruning Enables Efficient Active Learning
Abdul Hameed Azeemi
I. Qazi
Agha Ali Raza
VLM
36
1
0
05 Oct 2024
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
Juraj Juraska
Daniel Deutsch
Mara Finkelstein
Markus Freitag
39
14
0
04 Oct 2024
What do Large Language Models Need for Machine Translation Evaluation?
What do Large Language Models Need for Machine Translation Evaluation?
Shenbin Qian
Archchana Sindhujan
Minnie Kabra
Diptesh Kanojia
Constantin Orasan
Tharindu Ranasinghe
Frédéric Blain
ELM
LRM
ALM
LM&MA
35
0
0
04 Oct 2024
A Multi-task Learning Framework for Evaluating Machine Translation of
  Emotion-loaded User-generated Content
A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content
Shenbin Qian
Constantin Orasan
Diptesh Kanojia
Félix do Carmo
33
0
0
04 Oct 2024
AIME: AI System Optimization via Multiple LLM Evaluators
AIME: AI System Optimization via Multiple LLM Evaluators
Bhrij Patel
Souradip Chakraborty
Wesley A Suttle
Mengdi Wang
Amrit Singh Bedi
Dinesh Manocha
29
8
0
04 Oct 2024
InstaTrans: An Instruction-Aware Translation Framework for Non-English
  Instruction Datasets
InstaTrans: An Instruction-Aware Translation Framework for Non-English Instruction Datasets
Yungi Kim
Chanjun Park
31
0
0
02 Oct 2024
Cross-lingual Back-Parsing: Utterance Synthesis from Meaning
  Representation for Zero-Resource Semantic Parsing
Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing
Deokhyung Kang
Seonjeong Hwang
Yunsu Kim
Gary Geunbae Lee
31
0
0
01 Oct 2024
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic
  Post-Editing in LLM Translation Evaluators
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators
Qingyu Lu
Liang Ding
Kanjian Zhang
Jinxia Zhang
Dacheng Tao
35
3
0
22 Sep 2024
What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on
  Curiosity-Driven Questioning
What Would You Ask When You First Saw a2+b2=c2a^2+b^2=c^2a2+b2=c2? Evaluating LLM on Curiosity-Driven Questioning
Shashidhar Reddy Javaji
Zining Zhu
ELM
ALM
39
0
0
19 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Zhengyuan Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
53
12
0
04 Sep 2024
Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation
  Instructions
Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss
Christian Poelitz
Ian Drosos
Vu Le
Nick McKenna
Carina Negreanu
Chris Parnin
Advait Sarkar
ELM
ALM
35
13
0
16 Aug 2024
LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal
  Classification
LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classification
Zhen Qin
Junru Wu
Jiaming Shen
Tianqi Liu
Xuanhui Wang
63
3
0
06 Aug 2024
Questionnaires for Everyone: Streamlining Cross-Cultural Questionnaire
  Adaptation with GPT-Based Translation Quality Evaluation
Questionnaires for Everyone: Streamlining Cross-Cultural Questionnaire Adaptation with GPT-Based Translation Quality Evaluation
Otso Haavisto
Robin Welsch
24
0
0
30 Jul 2024
Machine Translation Hallucination Detection for Low and High Resource
  Languages using Large Language Models
Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models
Kenza Benkirane
Laura Gongas
Shahar Pelles
Naomi Fuchs
Joshua Darmon
Pontus Stenetorp
David Ifeoluwa Adelani
Eduardo Sánchez
HILM
40
4
0
23 Jul 2024
Benchmarks as Microscopes: A Call for Model Metrology
Benchmarks as Microscopes: A Call for Model Metrology
Michael Stephen Saxon
Ari Holtzman
Peter West
William Y. Wang
Naomi Saphra
39
10
0
22 Jul 2024
Fine-grained Gender Control in Machine Translation with Large Language
  Models
Fine-grained Gender Control in Machine Translation with Large Language Models
Minwoo Lee
Hyukhun Koh
Minsu Kim
Kyomin Jung
38
0
0
21 Jul 2024
A Survey on Failure Analysis and Fault Injection in AI Systems
A Survey on Failure Analysis and Fault Injection in AI Systems
Guangba Yu
Gou Tan
Haojia Huang
Zhenyu Zhang
Pengfei Chen
Roberto Natella
Zibin Zheng
36
3
0
28 Jun 2024
The Multilingual Alignment Prism: Aligning Global and Local Preferences
  to Reduce Harm
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
Aakanksha
Arash Ahmadian
Beyza Ermis
Seraphina Goldfarb-Tarrant
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
40
28
0
26 Jun 2024
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine
  Translation and Summarization Evaluation
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Christoph Leiter
Steffen Eger
34
8
0
26 Jun 2024
LLMs instead of Human Judges? A Large Scale Empirical Study across 20
  NLP Evaluation Tasks
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
A. Bavaresco
Raffaella Bernardi
Leonardo Bertolazzi
Desmond Elliott
Raquel Fernández
...
David Schlangen
Alessandro Suglia
Aditya K Surikuchi
Ece Takmaz
A. Testoni
ALM
ELM
54
62
0
26 Jun 2024
Themis: Towards Flexible and Interpretable NLG Evaluation
Themis: Towards Flexible and Interpretable NLG Evaluation
Xinyu Hu
Li Lin
Mingqi Gao
Xunjian Yin
Xiaojun Wan
ELM
34
6
0
26 Jun 2024
PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement
  on Multilingual and Multi-Cultural Data
PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data
Ishaan Watts
Varun Gumma
Aditya Yadavalli
Vivek Seshadri
Manohar Swaminathan
Sunayana Sitaram
ELM
48
9
0
21 Jun 2024
MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of
  Metaphorical Language
MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language
Shun Wang
Ge Zhang
Han Wu
Tyler Loakman
Wenhao Huang
Chenghua Lin
40
2
0
19 Jun 2024
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
Sumanth Doddapaneni
Mohammed Safi Ur Rahman Khan
Sshubam Verma
Mitesh Khapra
42
11
0
19 Jun 2024
A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method
  using GPT-4
A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method using GPT-4
Ming Gu
Yan Yang
23
0
0
17 Jun 2024
Exploring the Correlation between Human and Machine Evaluation of
  Simultaneous Speech Translation
Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation
Xiaoman Wang
Claudio Fantinuoli
27
1
0
14 Jun 2024
Defining and Detecting Vulnerability in Human Evaluation Guidelines: A
  Preliminary Study Towards Reliable NLG Evaluation
Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation
Jie Ruan
Wenqing Wang
Xiaojun Wan
AAML
ELM
36
3
0
12 Jun 2024
Are Large Language Models Actually Good at Text Style Transfer?
Are Large Language Models Actually Good at Text Style Transfer?
Sourabrata Mukherjee
Atul Kr. Ojha
Ondrej Dusek
29
11
0
09 Jun 2024
Large Language Models as Evaluators for Recommendation Explanations
Large Language Models as Evaluators for Recommendation Explanations
Xiaoyu Zhang
Yishan Li
Jiayin Wang
Bowen Sun
Weizhi Ma
Peijie Sun
Min Zhang
LRM
ELM
48
12
0
05 Jun 2024
Preemptive Answer "Attacks" on Chain-of-Thought Reasoning
Preemptive Answer "Attacks" on Chain-of-Thought Reasoning
Rongwu Xu
Zehan Qi
Wei Xu
LRM
SILM
64
6
0
31 May 2024
A Full-duplex Speech Dialogue Scheme Based On Large Language Models
A Full-duplex Speech Dialogue Scheme Based On Large Language Models
Peng Wang
Songshuo Lu
Yaohua Tang
Sijie Yan
Yuanjun Xiong
Wei Xia
AuLLM
36
10
0
29 May 2024
SLIDE: A Framework Integrating Small and Large Language Models for
  Open-Domain Dialogues Evaluation
SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation
Kun Zhao
Bohao Yang
Chen Tang
Chenghua Lin
Liang Zhan
46
5
0
24 May 2024
Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating
  Representative and Affinity Bias in Large Language Models
Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models
Abhishek Kumar
Sarfaroz Yunusov
Ali Emami
41
3
0
23 May 2024
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer
  Selection in Large Language Models
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Zhiyuan Zeng
Xiaonan Li
...
Qinyuan Cheng
Ding Wang
Xiaofeng Mou
Xipeng Qiu
XuanJing Huang
LRM
46
4
0
21 May 2024
What Have We Achieved on Non-autoregressive Translation?
What Have We Achieved on Non-autoregressive Translation?
Yafu Li
Huajian Zhang
Jianhao Yan
Yongjing Yin
Yue Zhang
33
1
0
21 May 2024
Language Models can Evaluate Themselves via Probability Discrepancy
Language Models can Evaluate Themselves via Probability Discrepancy
Tingyu Xia
Bowen Yu
Yuan Wu
Yi-Ju Chang
Chang Zhou
ELM
37
4
0
17 May 2024
LLM Discussion: Enhancing the Creativity of Large Language Models via
  Discussion Framework and Role-Play
LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play
Li-Chun Lu
Shou-Jen Chen
Tsung-Min Pai
Chan-Hung Yu
Hung-yi Lee
Shao-Hua Sun
LLMAG
56
39
0
10 May 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for
  Pairwise Comparisons
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
Adian Liusie
Vatsal Raina
Yassir Fathullah
Mark J. F. Gales
43
9
0
09 May 2024
Special Characters Attack: Toward Scalable Training Data Extraction From
  Large Language Models
Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models
Yang Bai
Ge Pei
Jindong Gu
Yong Yang
Xingjun Ma
33
10
0
09 May 2024
Evaluating Large Language Models for Structured Science Summarization in
  the Open Research Knowledge Graph
Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph
Vladyslav Nechakhin
Jennifer D'Souza
Steffen Eger
54
4
0
03 May 2024
Large Language Models are Inconsistent and Biased Evaluators
Large Language Models are Inconsistent and Biased Evaluators
Rickard Stureborg
Dimitris Alikaniotis
Yoshi Suhara
ALM
47
51
0
02 May 2024
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of
  Diverse Models
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Pat Verga
Sebastian Hofstatter
Sophia Althammer
Yixuan Su
Aleksandra Piktus
Arkady Arkhangorodsky
Minjie Xu
Naomi White
Patrick Lewis
ALM
ELM
37
87
0
29 Apr 2024
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive
  Study
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
Van Bach Nguyen
Paul Youssef
Jorg Schlotterer
Christin Seifert
39
14
0
26 Apr 2024
Previous
12345
Next