Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.14520
Cited By
Large Language Models Are State-of-the-Art Evaluators of Translation Quality
28 February 2023
Tom Kocmi
C. Federmann
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Language Models Are State-of-the-Art Evaluators of Translation Quality"
50 / 226 papers shown
Title
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models
Tobias Domhan
Dawei Zhu
33
0
0
03 May 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
31
0
0
29 Apr 2025
Multilingual Performance Biases of Large Language Models in Education
Vansh Gupta
Sankalan Pal Chowdhury
Vilém Zouhar
Donya Rooein
Mrinmaya Sachan
AI4Ed
LRM
50
0
0
24 Apr 2025
Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: A Pilot Study
Andy Li
Wei Zhou
Rashina Hoda
Chris Bain
Peter Poon
LM&MA
ELM
35
0
0
23 Apr 2025
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Mohit Gupta
Akiko Aizawa
R. Shah
LM&MA
ELM
30
0
0
21 Apr 2025
Testing LLMs' Capabilities in Annotating Translations Based on an Error Typology Designed for LSP Translation: First Experiments with ChatGPT
Joachim Minder
Guillaume Wisniewski
Natalie Kübler
28
0
0
21 Apr 2025
Translation Analytics for Freelancers: I. Introduction, Data Preparation, Baseline Evaluations
Yuri Balashov
Alex Balashov
Shiho Fukuda Koski
31
0
0
20 Apr 2025
Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling
Shaomu Tan
Christof Monz
42
0
0
18 Apr 2025
LLM Sensitivity Evaluation Framework for Clinical Diagnosis
Chenwei Yan
Xiangling Fu
Yuxuan Xiong
Tianyi Wang
Siu Cheung Hui
Ji Wu
Xien Liu
LM&MA
ELM
35
0
0
18 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
37
0
0
18 Apr 2025
An LLM-as-a-judge Approach for Scalable Gender-Neutral Translation Evaluation
Andrea Piergentili
Beatrice Savoldi
Matteo Negri
L. Bentivogli
ELM
37
0
0
16 Apr 2025
LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA
Xanh Ho
Jiahao Huang
Florian Boudin
Akiko Aizawa
ELM
36
0
0
16 Apr 2025
Deep Reasoning Translation via Reinforcement Learning
Jiaan Wang
Fandong Meng
Jie Zhou
OffRL
LRM
33
0
0
14 Apr 2025
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?
Daniil Larionov
Sotaro Takeshita
Ran Zhang
Yanran Chen
Christoph Leiter
Zhipin Wang
Christian Greisinger
Steffen Eger
ReLM
ELM
LRM
72
1
0
10 Apr 2025
Regional Tiny Stories: Using Small Models to Compare Language Learning and Tokenizer Performance
Nirvan Patil
Malhar Abhay Inamdar
Agnivo Gosai
Guruprasad Pathak
Anish Joshi
Aryan Sagavekar
Anish Joshirao
Raj Abhijit Dandekar
Rajat Dandekar
Sreedath Panat
46
0
0
07 Apr 2025
Taxonomy-Aware Evaluation of Vision-Language Models
Vésteinn Snæbjarnarson
Kevin Du
Niklas Stoehr
Serge Belongie
Ryan Cotterell
Nico Lang
Stella Frank
32
0
0
07 Apr 2025
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation
Baban Gain
Dibyanayan Bandyopadhyay
Asif Ekbal
LM&MA
58
1
0
02 Apr 2025
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
Vivek Iyer
Ricardo Rei
Pinzhen Chen
Alexandra Birch
SyDa
LM&MA
70
0
0
29 Mar 2025
Improving LLM-based Document-level Machine Translation with Multi-Knowledge Fusion
Bin Liu
Xinglin Lyu
Junhui Li
Daimeng Wei
M. Zhang
Shimin Tao
Hao Yang
61
0
0
15 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
49
1
0
14 Mar 2025
Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations
Ishani Mondal
Jack W. Stokes
S. Jauhar
Longqi Yang
Mengting Wan
Xiaofeng Xu
Xia Song
Jennifer Neville
53
0
0
11 Mar 2025
Is Your Video Language Model a Reliable Judge?
M. Liu
Wensheng Zhang
64
2
0
07 Mar 2025
Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
Yafu Li
Ronghao Zhang
Zhilin Wang
Huajian Zhang
Leyang Cui
Yongjing Yin
Tong Xiao
Yue Zhang
76
0
0
06 Mar 2025
How Do Hackathons Foster Creativity? Towards AI Collaborative Evaluation of Creativity at Scale
Jeanette Falk
Yiyi Chen
Janet Rafner
Mike Zhang
Johannes Bjerva
Alexander Nolte
63
1
0
06 Mar 2025
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
Jinghao Zhang
Alireza Salemi
Xinting Hu
Luu Anh Tuan
Fuli Feng
Hamed Zamani
Xiangnan He
Tat-Seng Chua
3DV
79
2
0
04 Mar 2025
Argument Summarization and its Evaluation in the Era of Large Language Models
Moritz Altemeyer
Steffen Eger
Johannes Daxenberger
Tim Altendorf
Philipp Cimiano
Benjamin Schiller
LM&MA
ELM
LRM
67
0
0
02 Mar 2025
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
Zhaopeng Feng
Jiayuan Su
Jiamei Zheng
Jiahan Ren
Yan Zhang
Jian Wu
Hongwei Wang
Zuozhu Liu
ELM
208
0
0
21 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
57
3
0
21 Feb 2025
Aligning Sentence Simplification with ESL Learner's Proficiency for Language Acquisition
Guanlin Li
Yuki Arase
Noel Crespi
54
0
0
17 Feb 2025
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang
Wenhao Zhu
Hanxu Hu
Conghui He
Lei Li
Shujian Huang
Fei Yuan
ELM
59
3
0
11 Feb 2025
Aligning Black-box Language Models with Human Judgments
Gerrit J. J. van den Burg
Gen Suzuki
Wei Liu
Murat Sensoy
ALM
82
0
0
07 Feb 2025
Speech Translation Refinement using Large Language Models
Huaixia Dou
Xinyu Tian
Xinglin Lyu
Jie Zhu
Junhui Li
Lifan Guo
149
0
0
28 Jan 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
Personalizing Education through an Adaptive LMS with Integrated LLMs
Kyle Spriggs
Meng Cheng Lau
Kalpdrum Passi
AI4Ed
57
0
0
24 Jan 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
41
2
0
21 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
132
69
0
20 Jan 2025
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation
Shunfan Zheng
Xiechi Zhang
Gerard de Melo
Xiaoling Wang
Linlin Wang
LM&MA
ELM
42
0
0
12 Jan 2025
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages
Archchana Sindhujan
Diptesh Kanojia
Constantin Orasan
Shenbin Qian
38
1
0
08 Jan 2025
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models
Hao Li
C. Bezemer
Ahmed E. Hassan
45
2
0
08 Jan 2025
RecSys Arena: Pair-wise Recommender System Evaluation with Large Language Models
Zhuo Wu
Qinglin Jia
Chuhan Wu
Zhaocheng Du
Shuai Wang
Z. Wang
Zhenhua Dong
OffRL
69
0
0
15 Dec 2024
Assessing the Impact of Conspiracy Theories Using Large Language Models
Bohan Jiang
Dawei Li
Zhen Tan
Xinyi Zhou
Ashwin Rao
Kristina Lerman
H. Bernard
Huan Liu
85
2
0
09 Dec 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
123
67
0
25 Nov 2024
LLMPirate: LLMs for Black-box Hardware IP Piracy
Vasudev Gohil
Matthew DeLorenzo
Veera Vishwa Achuta Sai Venkat Nallam
Joey See
Jeyavijayan Rajendran
69
3
0
25 Nov 2024
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
M. Finkelstein
Dan Deutsch
Parker Riley
Juraj Juraska
Geza Kovacs
Markus Freitag
76
0
0
23 Nov 2024
Neon: News Entity-Interaction Extraction for Enhanced Question Answering
Sneha Singhania
Silviu Cucerzan
Allen Herring
S. Jauhar
KELM
74
0
0
19 Nov 2024
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
Bo Yang
Qingping Yang
Runtao Liu
Runtao Liu
LRM
ReLM
ELM
AIMat
67
1
0
11 Nov 2024
Context-Informed Machine Translation of Manga using Multimodal Large Language Models
Philip Lippmann
Konrad Skublicki
Joshua Tanner
Shonosuke Ishiwatari
Jie-jin Yang
36
0
0
04 Nov 2024
AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs
Clemencia Siro
Yifei Yuan
Mohammad Aliannejadi
Maarten de Rijke
ELM
25
3
0
25 Oct 2024
From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items
Melissa Roemmele
Andrew S. Gordon
35
1
0
18 Oct 2024
HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World Multilingual Settings
Varun Gumma
Anandhita Raghunath
Mohit Jain
Sunayana Sitaram
LM&MA
34
1
0
17 Oct 2024
1
2
3
4
5
Next