Perception Score, A Learned Metric for Open-ended Text Generation Evaluation

7 August 2020

Papers citing "Perception Score, A Learned Metric for Open-ended Text Generation Evaluation"

15 / 15 papers shown

Title
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation SeongYeub Chu JongWoo Kim MunYong Yi 86 4 0 21 Feb 2025
Evaluation of Text Generation: A Survey Asli Celikyilmaz Elizabeth Clark Jianfeng Gao ELM LM&MA 89 380 0 26 Jun 2020
Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models Wangchunshu Zhou Ke Xu ELM ALM 27 43 0 12 Feb 2020
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance Wei Zhao Maxime Peyrard Fei Liu Yang Gao Christian M. Meyer Steffen Eger 132 592 0 05 Sep 2019
BERTScore: Evaluating Text Generation with BERT Tianyi Zhang Varsha Kishore Felix Wu Kilian Q. Weinberger Yoav Artzi 228 5,668 0 21 Apr 2019
Unifying Human and Statistical Evaluation for Natural Language Generation Tatsunori B. Hashimoto Hugh Zhang Percy Liang 52 223 0 04 Apr 2019
The price of debiasing automatic metrics in natural language evaluation Arun Tejasvi Chaganty Stephen Mussmann Percy Liang 42 116 0 06 Jul 2018
Learning to Evaluate Image Captioning Huayu Chen Guandao Yang Andreas Veit Xun Huang Serge J. Belongie 55 147 0 17 Jun 2018
Learning Confidence for Out-of-Distribution Detection in Neural Networks Terrance Devries Graham W. Taylor OOD OODD 75 584 0 13 Feb 2018
Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses Ryan J. Lowe Michael Noseworthy Iulian Serban Nicolas Angelard-Gontier Yoshua Bengio Joelle Pineau 46 372 0 23 Aug 2017
Why We Need New Evaluation Metrics for NLG Jekaterina Novikova Ondrej Dusek Amanda Cercas Curry Verena Rieser 69 456 0 21 Jul 2017
Wasserstein GAN Martín Arjovsky Soumith Chintala Léon Bottou GAN 152 4,822 0 26 Jan 2017
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation Chia-Wei Liu Ryan J. Lowe Iulian Serban Michael Noseworthy Laurent Charlin Joelle Pineau 91 1,292 0 25 Mar 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 533 9,233 0 06 Jun 2015
Microsoft COCO Captions: Data Collection and Evaluation Server Xinlei Chen Hao Fang Nayeon Lee Ramakrishna Vedantam Saurabh Gupta Piotr Dollar C. L. Zitnick 174 2,461 0 01 Apr 2015