ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.01340
  4. Cited By
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
v1v2 (latest)

A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls

3 January 2025
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice Oh
Seohyon Jung
ArXiv (abs)PDFHTML

Papers citing "A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls"

24 / 24 papers shown
Title
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
Ryan Liu
Jiayi Geng
Addison J. Wu
Ilia Sucholutsky
Tania Lombrozo
Thomas Griffiths
ReLMLRM
158
33
0
27 Oct 2024
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
Ran Zhang
Wei Zhao
Steffen Eger
144
10
0
24 Oct 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague
Fangcong Yin
Juan Diego Rodriguez
Dongwei Jiang
Manya Wadhwa
Prasann Singhal
Xinyu Zhao
Xi Ye
Kyle Mahowald
Greg Durrett
ReLMLRM
250
132
0
18 Sep 2024
Findings of the WMT 2023 Shared Task on Discourse-Level Literary
  Translation: A Fresh Orb in the Cosmos of LLMs
Findings of the WMT 2023 Shared Task on Discourse-Level Literary Translation: A Fresh Orb in the Cosmos of LLMs
Longyue Wang
Zhaopeng Tu
Yan Gu
Siyou Liu
Dian Yu
...
Bonnie Webber
Philipp Koehn
Andy Way
Yulin Yuan
Shuming Shi
88
20
0
06 Nov 2023
GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4
GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4
Tom Kocmi
C. Federmann
102
83
0
21 Oct 2023
xCOMET: Transparent Machine Translation Evaluation through Fine-grained
  Error Detection
xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection
Nuno M. Guerreiro
Ricardo Rei
Daan van Stigt
Luísa Coheur
Pierre Colombo
André F.T. Martins
120
139
0
16 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language
  Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALMLM&MAELM
113
240
0
12 Oct 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELMLLMAGALM
102
504
0
14 Aug 2023
SDXL: Improving Latent Diffusion Models for High-Resolution Image
  Synthesis
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell
Zion English
Kyle Lacey
A. Blattmann
Tim Dockhorn
Jonas Muller
Joe Penna
Robin Rombach
409
2,460
0
04 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
628
4,460
0
09 Jun 2023
Large Language Models are not Fair Evaluators
Large Language Models are not Fair Evaluators
Peiyi Wang
Lei Li
Liang Chen
Zefan Cai
Dawei Zhu
Binghuai Lin
Yunbo Cao
Qi Liu
Tianyu Liu
Zhifang Sui
ALM
166
575
0
29 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILMALM
259
705
0
23 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
156
608
0
22 May 2023
Large language models effectively leverage document-level context for
  literary translation, but critical errors persist
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
100
89
0
06 Apr 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELMALMLM&MA
252
1,216
0
29 Mar 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation
  with Question Answering
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu
Benlin Liu
Jungo Kasai
Yizhong Wang
Mari Ostendorf
Ranjay Krishna
Noah A. Smith
EGVM
90
239
0
21 Mar 2023
Exploring Document-Level Literary Machine Translation with Parallel
  Paragraphs from World Literature
Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature
Katherine Thai
Marzena Karpinska
Kalpesh Krishna
Bill Ray
M. Inghilleri
John Wieting
Mohit Iyyer
71
48
0
25 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
583
4,443
0
28 Jan 2022
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
434
5,021
0
24 Feb 2021
COMET: A Neural Framework for MT Evaluation
COMET: A Neural Framework for MT Evaluation
Ricardo Rei
Craig Alan Stewart
Ana C. Farinha
A. Lavie
234
1,100
0
18 Sep 2020
BLEURT: Learning Robust Metrics for Text Generation
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
154
1,511
0
09 Apr 2020
Asking and Answering Questions to Evaluate the Factual Consistency of
  Summaries
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
HILM
98
486
0
08 Apr 2020
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
663
5,897
0
21 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
2.0K
95,668
0
11 Oct 2018
1