Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.04696
Cited By
BLEURT: Learning Robust Metrics for Text Generation
9 April 2020
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BLEURT: Learning Robust Metrics for Text Generation"
50 / 73 papers shown
Title
Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection
Shrey Pandit
Ashwin Vinod
Liu Leqi
Ying Ding
HILM
46
0
0
23 May 2025
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods
Hyang Cui
LRM
67
0
0
22 May 2025
Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models
Banca Calvo Figueras
Rodrigo Agerri
ALM
ELM
LRM
162
1
0
16 May 2025
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data
Wei Zou
Sen Yang
Yu Bao
Shujian Huang
Jiajun Chen
Shanbo Cheng
SyDa
89
1
0
20 Apr 2025
LLMs Can Achieve High-quality Simultaneous Machine Translation as Efficiently as Offline
Biao Fu
Minpeng Liao
Kai Fan
Chengxi Li
Li Zhang
Yidong Chen
Xiaodong Shi
OffRL
341
1
0
13 Apr 2025
FUSE : A Ridge and Random Forest-Based Metric for Evaluating MT in Indigenous Languages
Rahul Raja
A. Vats
112
1
0
28 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
148
7
0
08 Mar 2025
Leveraging Large Language Models for Building Interpretable Rule-Based Data-to-Text Systems
Jędrzej Warczyński
Mateusz Lango
Ondrej Dusek
63
0
0
28 Feb 2025
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
Zhaopeng Feng
Jiayuan Su
Jiamei Zheng
Jiahan Ren
Yan Zhang
Jian Wu
Hongwei Wang
Zuozhu Liu
ELM
234
1
0
21 Feb 2025
Mind the Style Gap: Meta-Evaluation of Style and Attribute Transfer Metrics
Amalie Brogaard Pauli
Isabelle Augenstein
Ira Assent
117
0
0
20 Feb 2025
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Wenwen Xie
Gray Gwizdz
Dongji Feng
112
0
0
20 Feb 2025
A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond
Shreya Shukla
Jose Torres
Abhijit Mishra
Jacek Gwizdka
Shounak Roychowdhury
81
0
0
20 Feb 2025
Learning to Substitute Words with Model-based Score Ranking
Hongye Liu
Ricardo Henao
120
0
0
09 Feb 2025
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu
Yao Chen
Zeyi Wen
Weiguo Liu
Bingsheng He
138
2
0
02 Feb 2025
MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models
Zhongpu Chen
Yixiao Liu
Long Shi
Zhi-Jie Wang
Xingyan Chen
Yu Zhao
Fuji Ren
87
1
0
28 Jan 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
55
2
0
28 Jan 2025
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
412
0
0
20 Jan 2025
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
Suvodip Dey
M. Desarkar
OffRL
77
0
0
20 Jan 2025
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice Oh
Seohyon Jung
148
1
0
03 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
162
26
0
03 Jan 2025
LLM-based Translation Inference with Iterative Bilingual Understanding
Andong Chen
Kehai Chen
Yang Xiang
Xuefeng Bai
Muyun Yang
Yang Feng
Tiejun Zhao
Min Zhang
LRM
107
5
0
31 Dec 2024
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
85
4
0
08 Nov 2024
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
Ran Zhang
Wei Zhao
Steffen Eger
106
8
0
24 Oct 2024
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han
Akiko Eriguchi
Haoran Xu
Hieu T. Hoang
Marine Carpuat
Huda Khayrallah
VLM
69
3
0
12 Oct 2024
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
H. Xia
Zhengbang Yang
Junbo Zou
Rhys Tracy
Yuqing Wang
...
Xun Shao
Zhuoqing Xie
Yuan-fang Wang
Weining Shen
Hanjie Chen
ReLM
LRM
ELM
64
3
0
11 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILM
AIFin
92
42
0
03 Oct 2024
Better Instruction-Following Through Minimum Bayes Risk
Ian Wu
Patrick Fernandes
Amanda Bertsch
Seungone Kim
Sina Pakazad
Graham Neubig
116
11
0
03 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
99
11
0
03 Oct 2024
Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Kaden Uhlig
Joern Wuebker
Raphael Reinauer
John DeNero
80
0
0
26 Sep 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment
Leitian Tao
Yixuan Li
109
8
0
13 Sep 2024
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs
Eui Jun Hwang
Sukmin Cho
Junmyeong Lee
Jong C. Park
SLR
85
4
0
20 Aug 2024
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts
Jiaqing Liu
Chong Deng
Qinglin Zhang
Shilin Zhou
Hai Yu
Hai Yu
Wen Wang
71
0
0
19 Aug 2024
On Speeding Up Language Model Evaluation
Jin Peng Zhou
Christian K. Belardi
Ruihan Wu
Travis Zhang
Carla P. Gomes
Wen Sun
Kilian Q. Weinberger
82
1
0
08 Jul 2024
Sentence-level Aggregation of Lexical Metrics Correlates Stronger with Human Judgements than Corpus-level Aggregation
Paulo Cavalin
P. Domingues
Claudio S. Pinhanez
71
1
0
03 Jul 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
99
12
0
26 May 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
150
27
0
20 May 2024
Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore
Junchao Wu
Runzhe Zhan
Derek F. Wong
Shu Yang
Xuebo Liu
Lidia S. Chao
Min Zhang
DeLMO
100
5
0
07 May 2024
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
156
56
0
23 Apr 2024
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
Hanlei Jin
Yang Zhang
Dan Meng
Jun Wang
Jinghua Tan
214
91
0
05 Mar 2024
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song
Mengyue Wu
Ke Zhu
Chunhao Zhang
Yanyi Chen
LRM
ELM
69
3
0
28 Feb 2024
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Hanjie Chen
Zhouxiang Fang
Yash Singla
Mark Dredze
ELM
AI4MH
95
39
0
28 Feb 2024
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation
Yun-Wei Chu
Dong-Jun Han
Christopher G. Brinton
95
4
0
15 Jan 2024
Semantic Consistency for Assuring Reliability of Large Language Models
Harsh Raj
Vipul Gupta
Domenic Rosati
S. Majumdar
HILM
132
14
0
17 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
106
3
0
08 Aug 2023
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
96
15
0
11 Jun 2022
Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR
Juri Opitz
Anette Frank
107
35
0
20 Aug 2020
Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation
Ran Tian
Shashi Narayan
Thibault Sellam
Ankur P. Parikh
HILM
84
96
0
19 Oct 2019
Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)
Ondrej Dusek
Karin Sevegnani
Ioannis Konstas
Verena Rieser
ALM
44
9
0
10 Oct 2019
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Wei Zhao
Maxime Peyrard
Fei Liu
Yang Gao
Christian M. Meyer
Steffen Eger
171
598
0
05 Sep 2019
SumQE: a BERT-based Summary Quality Estimation Model
Stratos Xenouleas
Prodromos Malakasiotis
Marianna Apidianaki
Ion Androutsopoulos
45
37
0
02 Sep 2019
1
2
Next