ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16129
  4. Cited By
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods

LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods

22 May 2025
Hyang Cui
    LRM
ArXivPDFHTML

Papers citing "LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods"

11 / 11 papers shown
Title
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
359
1,641
0
22 Jan 2025
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not
  Evaluate
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate
Juhyun Oh
Eunsu Kim
Inha Cha
Alice Oh
ELM
59
9
0
09 Feb 2024
Lost in the Source Language: How Large Language Models Evaluate the
  Quality of Machine Translation
Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation
Xu Huang
Zhirui Zhang
Xiang Geng
Yichao Du
Jiajun Chen
Shujian Huang
55
10
0
12 Jan 2024
xCOMET: Transparent Machine Translation Evaluation through Fine-grained
  Error Detection
xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection
Nuno M. Guerreiro
Ricardo Rei
Daan van Stigt
Luísa Coheur
Pierre Colombo
André F.T. Martins
99
135
0
16 Oct 2023
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Guan-Bo Wang
Sijie Cheng
Xianyuan Zhan
Xiangang Li
Sen Song
Yang Liu
ALM
118
247
0
20 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
280
11,828
0
18 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
322
4,298
0
09 Jun 2023
Large Language Models Are State-of-the-Art Evaluators of Translation
  Quality
Large Language Models Are State-of-the-Art Evaluators of Translation Quality
Tom Kocmi
C. Federmann
ELM
82
361
0
28 Feb 2023
COMET: A Neural Framework for MT Evaluation
COMET: A Neural Framework for MT Evaluation
Ricardo Rei
Craig Alan Stewart
Ana C. Farinha
A. Lavie
104
1,091
0
18 Sep 2020
BLEURT: Learning Robust Metrics for Text Generation
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
92
1,496
0
09 Apr 2020
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
307
5,801
0
21 Apr 2019
1