Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.03082
Cited By
Perception Score, A Learned Metric for Open-ended Text Generation Evaluation
7 August 2020
Jing Gu
Qingyang Wu
Zhou Yu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Perception Score, A Learned Metric for Open-ended Text Generation Evaluation"
15 / 15 papers shown
Title
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
86
4
0
21 Feb 2025
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
89
380
0
26 Jun 2020
Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models
Wangchunshu Zhou
Ke Xu
ELM
ALM
27
43
0
12 Feb 2020
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Wei Zhao
Maxime Peyrard
Fei Liu
Yang Gao
Christian M. Meyer
Steffen Eger
132
592
0
05 Sep 2019
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
228
5,668
0
21 Apr 2019
Unifying Human and Statistical Evaluation for Natural Language Generation
Tatsunori B. Hashimoto
Hugh Zhang
Percy Liang
52
223
0
04 Apr 2019
The price of debiasing automatic metrics in natural language evaluation
Arun Tejasvi Chaganty
Stephen Mussmann
Percy Liang
42
116
0
06 Jul 2018
Learning to Evaluate Image Captioning
Huayu Chen
Guandao Yang
Andreas Veit
Xun Huang
Serge J. Belongie
55
147
0
17 Jun 2018
Learning Confidence for Out-of-Distribution Detection in Neural Networks
Terrance Devries
Graham W. Taylor
OOD
OODD
75
584
0
13 Feb 2018
Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Ryan J. Lowe
Michael Noseworthy
Iulian Serban
Nicolas Angelard-Gontier
Yoshua Bengio
Joelle Pineau
46
372
0
23 Aug 2017
Why We Need New Evaluation Metrics for NLG
Jekaterina Novikova
Ondrej Dusek
Amanda Cercas Curry
Verena Rieser
69
456
0
21 Jul 2017
Wasserstein GAN
Martín Arjovsky
Soumith Chintala
Léon Bottou
GAN
152
4,822
0
26 Jan 2017
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
Chia-Wei Liu
Ryan J. Lowe
Iulian Serban
Michael Noseworthy
Laurent Charlin
Joelle Pineau
91
1,292
0
25 Mar 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
533
9,233
0
06 Jun 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
174
2,461
0
01 Apr 2015
1