USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

1 May 2020

Papers citing "USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation"

50 / 55 papers shown

Title
Towards Better Evaluation for Generated Patent Claims Lekang Jiang Pascal A Scherz Stephan Goetz ELM 30 0 0 16 May 2025
JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry Anum Afzal Alexandre Mercier Florian Matthes 65 0 0 29 Apr 2025
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations Laura Dietz Oleg Zendel P. Bailey Charles L. A. Clarke Ellese Cotterill Jeff Dalton Faegheh Hasibi Mark Sanderson Nick Craswell ELM 50 0 0 27 Apr 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation Mingqi Gao Xinyu Hu Li Lin Xiaojun Wan 28 1 0 28 Jan 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge Aparna Elangovan Jongwoo Ko Lei Xu Mahsa Elyasi Ling Liu S. Bodapati Dan Roth 52 6 0 28 Jan 2025
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation Suvodip Dey M. Desarkar OffRL 46 0 0 20 Jan 2025
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems Justin Vasselli Adam Nohejl Taro Watanabe AAML 54 0 0 12 Jan 2025
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation Jaechang Kim Jinmin Goh Inseok Hwang Jaewoong Cho Jungseul Ok ELM 33 1 0 28 Oct 2024
4-LEGS: 4D Language Embedded Gaussian Splatting Gal Fiebelman Tamir Cohen Ayellet Morgenstern Peter Hedman Hadar Averbuch-Elor 3DGS 46 3 0 14 Oct 2024
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Qiyuan Zhang Yufei Wang Tiezheng YU Yuxin Jiang Chuhan Wu ... Xin Jiang Lifeng Shang Ruiming Tang Fuyuan Lyu Chen Ma 31 4 0 07 Oct 2024
CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells Atharva Naik Marcus Alenius Daniel Fried Carolyn Rose 42 0 0 29 Sep 2024
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues John Mendonça Isabel Trancoso A. Lavie 34 3 0 16 Jul 2024
Leveraging LLMs for Dialogue Quality Measurement Jinghan Jia A. Komma Timothy Leffel Xujun Peng Ajay Nagesh Tamer Soliman Aram Galstyan Anoop Kumar 42 5 0 25 Jun 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons Adian Liusie Vatsal Raina Yassir Fathullah Mark Gales 43 9 0 09 May 2024
RepEval: Effective Text Evaluation with LLM Representation Shuqian Sheng Yi Xu Tianhang Zhang Zanwei Shen Luoyi Fu Jiaxin Ding Lei Zhou Xinbing Wang Cheng Zhou 27 1 0 30 Apr 2024
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems Clemencia Siro Mohammad Aliannejadi Maarten de Rijke 43 3 0 15 Apr 2024
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects Minqian Liu Ying Shen Zhiyang Xu Yixin Cao Eunah Cho Vaibhav Kumar Reza Ghanadan Lifu Huang ELM LM&MA ALM 52 25 0 15 Nov 2023
Three Ways of Using Large Language Models to Evaluate Chat Ondvrej Plátek Vojtvech Hudevcek Patrícia Schmidtová Mateusz Lango Ondrej Dusek ALM 19 6 0 12 Aug 2023
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering Pei Ke Fei Huang Fei Mi Yasheng Wang Qun Liu Xiaoyan Zhu Minlie Huang ReLM ELM 36 10 0 13 Jul 2023
Psychological Metrics for Dialog System Evaluation Salvatore Giorgi Shreya Havaldar Farhan S. Ahmed Zuhaib Akhtar Shalaka Vaidya Gary Pan Pallavi V. Kulkarni H. Andrew Schwartz Joao Sedoc 22 2 0 24 May 2023
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks Anas Himmi Ekhine Irurozki Nathan Noiry Stéphan Clémençon Pierre Colombo 34 5 0 17 May 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment Yang Liu Dan Iter Yichong Xu Shuohang Wang Ruochen Xu Chenguang Zhu ELM ALM LM&MA 68 1,082 0 29 Mar 2023
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation Tianxing He Jingyu Zhang Tianle Wang Sachin Kumar Kyunghyun Cho James R. Glass Yulia Tsvetkov 40 44 0 20 Dec 2022
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment Chen Zhang L. F. D’Haro Qiquan Zhang Thomas Friedrichs Haizhou Li 26 7 0 18 Dec 2022
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems Shiki Sato Yosuke Kishinami Hiroaki Sugiyama Reina Akama Ryoko Tokuhisa Jun Suzuki 21 2 0 19 Nov 2022
On the Effectiveness of Automated Metrics for Text Generation Systems Pius von Daniken Jan Deriu Don Tuggener Mark Cieliebak 33 3 0 24 Oct 2022
Deepfake Text Detection: Limitations and Opportunities Jiameng Pu Zain Sarwar Sifat Muhammad Abdullah A. Rehman Yoonjin Kim P. Bhattacharya M. Javed Bimal Viswanath AAML 24 54 0 17 Oct 2022
Towards a Unified Multi-Dimensional Evaluator for Text Generation Ming Zhong Yang Liu Da Yin Yuning Mao Yizhu Jiao Peng Liu Chenguang Zhu Heng Ji Jiawei Han ELM 45 256 0 13 Oct 2022
Evaluating Agent Interactions Through Episodic Knowledge Graphs Selene Báez Santamaría Piek Vossen T. Baier 28 2 0 22 Sep 2022
Open-Domain Dialog Evaluation using Follow-Ups Likelihood Maxime De Bruyn Ehsan Lotfi Jeska Buhmann Walter Daelemans 40 9 0 12 Sep 2022
Dialogue Evaluation with Offline Reinforcement Learning Nurul Lubis Christian Geishauser Hsien-Chin Lin Carel van Niekerk Michael Heck Shutong Feng Milica Gavsić OffRL 27 4 0 02 Sep 2022
The Glass Ceiling of Automatic Evaluation in Natural Language Generation Pierre Colombo Maxime Peyrard Nathan Noiry Robert West Pablo Piantanida 49 11 0 31 Aug 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation Longxuan Ma Ziyu Zhuang Weinan Zhang Mingda Li Ting Liu 29 4 0 17 Aug 2022
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue Pengfei Zhang Xiao-fei Hu Kaidong Yu Jian Wang Song-Bo Han Cao Liu C. Yuan 27 7 0 19 Jun 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning Prakhar Gupta Cathy Jiao Yi-Ting Yeh Shikib Mehri M. Eskénazi Jeffrey P. Bigham ALM 44 47 0 25 May 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges Shikib Mehri Jinho Choi L. F. D’Haro Jan Deriu M. Eskénazi ... David Traum Yi-Ting Yeh Zhou Yu Yizhe Zhang Chen Zhang 34 21 0 18 Mar 2022
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems Tianbo Ji Yvette Graham Gareth J. F. Jones Chenyang Lyu Qun Liu ALM 36 39 0 11 Mar 2022
Probing the Robustness of Trained Metrics for Conversational Dialogue Systems Jan Deriu Don Tuggener Pius von Daniken Mark Cieliebak AAML 19 9 0 28 Feb 2022
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows Jianqiao Zhao Yanyang Li Wanyu Du Yangfeng Ji Dong Yu M. Lyu Liwei Wang 28 4 0 14 Feb 2022
Measuring Attribution in Natural Language Generation Models Hannah Rashkin Vitaly Nikolaev Matthew Lamm Lora Aroyo Michael Collins Dipanjan Das Slav Petrov Gaurav Singh Tomar Iulia Turc David Reitter 39 173 0 23 Dec 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems Chen Zhang João Sedoc L. F. D’Haro Rafael E. Banchs Alexander I. Rudnicky 22 36 0 03 Nov 2021
Better than Average: Paired Evaluation of NLP Systems Maxime Peyrard Wei-Ye Zhao Steffen Eger Robert West ELM 16 24 0 20 Oct 2021
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization Lei Shen Haolan Zhan Xin Shen Hongshen Chen Xiaofang Zhao Xiao-Dan Zhu 43 17 0 14 Sep 2021
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation Mingkai Deng Bowen Tan Zhengzhong Liu Eric Xing Zhiting Hu 16 72 0 14 Sep 2021
How to Evaluate Your Dialogue Models: A Review of Approaches Xinmeng Li Wansen Wu Long Qin Quanjun Yin ELM 30 8 0 03 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling Emily Dinan Gavin Abercrombie A. S. Bergman Shannon L. Spruit Dirk Hovy Y-Lan Boureau Verena Rieser 43 105 0 07 Jul 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation Prakhar Gupta Yulia Tsvetkov Jeffrey P. Bigham 42 22 0 10 Jun 2021
A Comprehensive Assessment of Dialog Evaluation Metrics Yi-Ting Yeh M. Eskénazi Shikib Mehri 36 105 0 07 Jun 2021
DynaEval: Unifying Turn and Dialogue Level Evaluation Chen Zhang Yiming Chen L. F. D’Haro Yan Zhang Thomas Friedrichs Grandee Lee Haizhou Li 24 73 0 02 Jun 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey Jinjie Ni Tom Young Vlad Pandelea Fuzhao Xue Min Zhang 54 268 0 10 May 2021