FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation

25 October 2022

Chen Zhang

Haizhou Li

Papers citing "FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation"

39 / 39 papers shown

Title
Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations Karthik Gopalakrishnan Behnam Hedayatnia Qinlang Chen Anna Gottardi Sanjeev Kwatra Anu Venkatesh Raefer Gabriel Dilek Z. Hakkani-Tür AI4MH BDL 65 330 0 23 Aug 2023
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations Sarik Ghazarian Nuan Wen Aram Galstyan Nanyun Peng 41 40 0 18 Mar 2022
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows Jianqiao Zhao Yanyang Li Wanyu Du Yangfeng Ji Dong Yu Michael R. Lyu Liwei Wang 43 4 0 14 Feb 2022
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents Eric Michael Smith Orion Hsu Rebecca Qian Stephen Roller Y-Lan Boureau Jason Weston 55 67 0 12 Jan 2022
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation Chen Zhang L. F. D’Haro Thomas Friedrichs Haizhou Li ELM 42 18 0 14 Dec 2021
Automatic Evaluation and Moderation of Open-domain Dialogue Systems Chen Zhang João Sedoc L. F. D’Haro Rafael E. Banchs Alexander I. Rudnicky 38 37 0 03 Nov 2021
Revisiting Self-Training for Few-Shot Learning of Language Model Yiming Chen Yan Zhang Chen Zhang Grandee Lee Ran Cheng Haizhou Li 37 42 0 04 Oct 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics Ananya B. Sai Tanay Dixit D. Y. Sheth S. Mohan Mitesh M. Khapra AAML 121 58 0 13 Sep 2021
Beyond Goldfish Memory: Long-Term Open-Domain Conversation Jing Xu Arthur Szlam Jason Weston RALM 33 250 0 15 Jul 2021
A Comprehensive Assessment of Dialog Evaluation Metrics Yi-Ting Yeh M. Eskénazi Shikib Mehri 53 107 0 07 Jun 2021
Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances Zekang Li Jinchao Zhang Zhengcong Fei Yang Feng Jie Zhou 38 57 0 04 Jun 2021
DynaEval: Unifying Turn and Dialogue Level Evaluation Chen Zhang Yiming Chen L. F. D’Haro Yan Zhang Thomas Friedrichs Grandee Lee Haizhou Li 39 73 0 02 Jun 2021
Retrieval Augmentation Reduces Hallucination in Conversation Kurt Shuster Spencer Poff Moya Chen Douwe Kiela Jason Weston HILM 69 721 0 15 Apr 2021
I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling Yixin Nie Mary Williamson Joey Tianyi Zhou Douwe Kiela Jason Weston 53 83 0 24 Dec 2020
Overview of the Ninth Dialog System Technology Challenge: DSTC9 Chulaka Gunasekara Seokhwan Kim L. F. D’Haro Abhinav Rastogi Yun-Nung Chen ... A. Geramifard Satwik Kottur Seungwhan Moon Shivani Poddar R. Subba 86 75 0 12 Nov 2020
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems Vitou Phy Yang Zhao Akiko Aizawa 24 55 0 01 Nov 2020
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems Lishan Huang Zheng Ye Jinghui Qin Liang Lin Xiaodan Liang 28 103 0 08 Oct 2020
Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining Ananya B. Sai Akash Kumar Mohankumar Siddharth Arora Mitesh M. Khapra 33 74 0 23 Sep 2020
Dialogue Response Ranking Training with Large-Scale Human Feedback Data Xiang Gao Yizhe Zhang Michel Galley Chris Brockett Bill Dolan ALM 52 105 0 15 Sep 2020
Multi-Task Learning with Deep Neural Networks: A Survey M. Crawshaw CVBM 151 615 0 10 Sep 2020
Unsupervised Evaluation of Interactive Dialog with DialoGPT Shikib Mehri M. Eskénazi 40 177 0 23 Jun 2020
Learning an Unreferenced Metric for Online Dialogue Evaluation Koustuv Sinha Prasanna Parthasarathi Jasmine Wang Ryan J. Lowe William L. Hamilton Joelle Pineau OffRL 48 84 0 01 May 2020
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation Shikib Mehri M. Eskénazi 41 220 0 01 May 2020
Recipes for building an open-domain chatbot Stephen Roller Emily Dinan Naman Goyal Da Ju Mary Williamson ... Myle Ott Kurt Shuster Eric Michael Smith Y-Lan Boureau Jason Weston ALM 107 1,001 0 28 Apr 2020
Towards a Human-like Open-Domain Chatbot Daniel De Freitas Minh-Thang Luong David R. So Jamie Hall Noah Fiedel ... Zi Yang Apoorv Kulshreshtha Gaurav Nemade Yifeng Lu Quoc V. Le 61 931 0 27 Jan 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury ... Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai Soumith Chintala ODL 242 42,038 0 03 Dec 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 121 7,386 0 02 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 398 24,160 0 26 Jul 2019
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems Asma Ghandeharioun J. Shen Natasha Jaques Craig Ferguson Noah J. Jones Àgata Lapedriza Rosalind W. Picard 47 91 0 21 Jun 2019
Survey on Evaluation Methods for Dialogue Systems Jan Deriu Álvaro Rodrigo Arantxa Otegi Guillermo Echegoyen S. Rosset Eneko Agirre Mark Cieliebak 50 280 0 10 May 2019
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings Sarik Ghazarian Johnny Tian-Zheng Wei Aram Galstyan Nanyun Peng 36 90 0 24 Apr 2019
What makes a good conversation? How controllable attributes affect human judgments A. See Stephen Roller Douwe Kiela Jason Weston 70 287 0 22 Feb 2019
The Second Conversational Intelligence Challenge (ConvAI2) Emily Dinan V. Logacheva Valentin Malykh Alexander H. Miller Kurt Shuster ... Alexander I. Rudnicky Jason Williams Joelle Pineau Andrey Kravchenko Jason Weston DRL 84 363 0 31 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 943 93,936 0 11 Oct 2018
Coherence Models for Dialogue Alessandra Cervone Evgeny A. Stepanov Giuseppe Riccardi 49 24 0 21 Jun 2018
Personalizing Dialogue Agents: I have a dog, do you have pets too? Saizheng Zhang Emily Dinan Jack Urbanek Arthur Szlam Douwe Kiela Jason Weston 80 1,442 0 22 Jan 2018
DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset Yanran Li Hui Su Xiaoyu Shen Wenjie Li Ziqiang Cao Shuzi Niu 48 1,291 0 11 Oct 2017
RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems Chongyang Tao Lili Mou Dongyan Zhao Rui Yan 51 217 0 11 Jan 2017
How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation Chia-Wei Liu Ryan J. Lowe Iulian Serban Michael Noseworthy Laurent Charlin Joelle Pineau 84 1,292 0 25 Mar 2016