ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons

6 September 2019

Jason Weston

Papers citing "ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons"

48 / 48 papers shown

Title
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation Eunsu Kim Juyoung Suk Seungone Kim Niklas Muennighoff Dongkwan Kim Alice H. Oh ELM 88 1 0 31 Dec 2024
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots Ekaterina Svikhnushina Pearl Pu 24 0 0 12 Sep 2024
Self-Emotion Blended Dialogue Generation in Social Simulation Agents Qiang Zhang Jason Naradowsky Yusuke Miyao 25 2 0 03 Aug 2024
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems Clemencia Siro Mohammad Aliannejadi Maarten de Rijke 35 3 0 15 Apr 2024
Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation Jian Wang Yi Cheng Dongding Lin Chak Tou Leong Wenjie Li 24 16 0 11 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives Yuchen Yang VGen 19 7 0 09 Oct 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Zeqiu Wu Yushi Hu Weijia Shi Nouha Dziri Alane Suhr Prithviraj Ammanabrolu Noah A. Smith Mari Ostendorf Hannaneh Hajishirzi ALM 30 304 0 02 Jun 2023
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models Qiang Zhang Jason Naradowsky Yusuke Miyao ELM 26 32 0 29 May 2023
Psychological Metrics for Dialog System Evaluation Salvatore Giorgi Shreya Havaldar Farhan S. Ahmed Zuhaib Akhtar Shalaka Vaidya Gary Pan Pallavi V. Kulkarni H. A. Schwartz Joao Sedoc 22 2 0 24 May 2023
Building Multimodal AI Chatbots Mingyu Lee 29 3 0 21 Apr 2023
Ontologically Faithful Generation of Non-Player Character Dialogues Nathaniel Weir Ryan Thomas Randolph DÁmore Kellie Hill Benjamin Van Durme Harsh Jhamtani 31 6 0 20 Dec 2022
BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets Minju Kim Chaehyeong Kim Yongho Song Seung-won Hwang Jinyoung Yeo 36 13 0 23 Oct 2022
Keep Me Updated! Memory Management in Long-term Conversations Sanghwan Bae Donghyun Kwak Soyoung Kang Min Young Lee Sungdong Kim Yuin Jeong Hyeri Kim Sang-Woo Lee W. Park Nako Sung 40 46 0 17 Oct 2022
Evaluating Agent Interactions Through Episodic Knowledge Graphs Selene Báez Santamaría Piek Vossen T. Baier 26 2 0 22 Sep 2022
PCC: Paraphrasing with Bottom-k Sampling and Cyclic Learning for Curriculum Data Augmentation Hongyuan Lu W. Lam 16 9 0 17 Aug 2022
GODEL: Large-Scale Pre-Training for Goal-Directed Dialog Baolin Peng Michel Galley Pengcheng He Chris Brockett Lars Liden E. Nouri Zhou Yu Bill Dolan Jianfeng Gao VLM 44 73 0 22 Jun 2022
KETOD: Knowledge-Enriched Task-Oriented Dialogue Zhiyu Zoey Chen Bing-Quan Liu Seungwhan Moon Chinnadhurai Sankar Paul A. Crook William Yang Wang 27 19 0 11 May 2022
State-of-the-art in Open-domain Conversational AI: A Survey Tosin P. Adewumi F. Liwicki Marcus Liwicki 29 15 0 02 May 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges Shikib Mehri Jinho Choi L. F. D’Haro Jan Deriu M. Eskénazi ... David Traum Yi-Ting Yeh Zhou Yu Yizhe Zhang Chen Zhang 30 21 0 18 Mar 2022
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training Yuxian Gu Jiaxin Wen Hao Sun Yi Song Pei Ke ... Zheng Zhang Jianzhu Yao Lei Liu Xiaoyan Zhu Minlie Huang 21 55 0 17 Mar 2022
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems Tianbo Ji Yvette Graham Gareth J. F. Jones Chenyang Lyu Qun Liu ALM 31 39 0 11 Mar 2022
A Literature Survey of Recent Advances in Chatbots Guendalina Caldarini Sardar F. Jaf K. McGarry AI4CE 35 274 0 17 Jan 2022
Self-Supervised Bot Play for Conversational Recommendation with Justifications Shuyang Li Bodhisattwa Prasad Majumder Julian McAuley 33 7 0 09 Dec 2021
Reason first, then respond: Modular Generation for Knowledge-infused Dialogue Leonard Adolphs Kurt Shuster Jack Urbanek Arthur Szlam Jason Weston KELM LRM 204 41 0 09 Nov 2021
SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures Megan Ung Jing Xu Y-Lan Boureau 8 47 0 14 Oct 2021
Teaching Models new APIs: Domain-Agnostic Simulators for Task Oriented Dialogue Moya Chen Paul A. Crook Stephen Roller ALM 48 7 0 13 Oct 2021
Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions Mohammad Aliannejadi Julia Kiseleva A. Chuklin Jeffrey Stephen Dalton Mikhail Burtsev 73 96 0 13 Sep 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling Emily Dinan Gavin Abercrombie A. S. Bergman Shannon L. Spruit Dirk Hovy Y-Lan Boureau Verena Rieser 37 105 0 07 Jul 2021
DynaEval: Unifying Turn and Dialogue Level Evaluation Chen Zhang Yiming Chen L. F. D’Haro Yan Zhang Thomas Friedrichs Grandee Lee Haizhou Li 24 73 0 02 Jun 2021
HERALD: An Annotation Efficient Method to Detect User Disengagement in Social Conversations Weixin Liang Kai-Hui Liang Zhou Yu 34 15 0 01 Jun 2021
LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing Yu Li Josh Arnold Feifan Yan Weiyan Shi Zhou Yu ELM 26 11 0 05 May 2021
Collaborative Storytelling with Large-scale Neural Language Models Eric Nichols Leo Gao R. Gomez 23 43 0 20 Nov 2020
Adding Chit-Chat to Enhance Task-Oriented Dialogues Kai Sun Seungwhan Moon Paul A. Crook Stephen Roller Becka Silvert Bing-Quan Liu Zhiguang Wang Honglei Liu Eunjoon Cho Claire Cardie 75 66 0 24 Oct 2020
An Evaluation Protocol for Generative Conversational Systems Seolhwa Lee Heuiseok Lim Jo˜ao Sedoc ELM 35 10 0 24 Oct 2020
Learning to summarize from human feedback Nisan Stiennon Long Ouyang Jeff Wu Daniel M. Ziegler Ryan J. Lowe Chelsea Voss Alec Radford Dario Amodei Paul Christiano ALM 19 1,984 0 02 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems Ananya B. Sai Akash Kumar Mohankumar Mitesh M. Khapra ELM 33 228 0 27 Aug 2020
PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning Siqi Bao H. He Fan Wang Hua-Hong Wu Haifeng Wang Wenquan Wu Zhen Guo Zhibin Liu Xinchao Xu 30 137 0 30 Jun 2020
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions Stephen Roller Y-Lan Boureau Jason Weston Antoine Bordes Emily Dinan ... Kurt Shuster Eric Michael Smith Arthur Szlam Jack Urbanek Mary Williamson LLMAG AI4CE 28 51 0 22 Jun 2020
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation Weixin Liang James Zou Zhou Yu ELM 34 33 0 21 May 2020
Learning an Unreferenced Metric for Online Dialogue Evaluation Koustuv Sinha Prasanna Parthasarathi Jasmine Wang Ryan J. Lowe William L. Hamilton Joelle Pineau OffRL 21 84 0 01 May 2020
XPersona: Evaluating Multilingual Personalized Chatbot Zhaojiang Lin Zihan Liu Genta Indra Winata Samuel Cahyawijaya Andrea Madotto Yejin Bang Etsuko Ishii Pascale Fung 45 57 0 17 Mar 2020
Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue Byeongchang Kim Jaewoo Ahn Gunhee Kim BDL 38 167 0 18 Feb 2020
Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training Margaret Li Stephen Roller Ilia Kulikov Sean Welleck Y-Lan Boureau Kyunghyun Cho Jason Weston 17 180 0 10 Nov 2019
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation Emily Dinan Angela Fan Adina Williams Jack Urbanek Douwe Kiela Jason Weston 27 205 0 10 Nov 2019
The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents Kurt Shuster Da Ju Stephen Roller Emily Dinan Y-Lan Boureau Jason Weston 17 81 0 09 Nov 2019
Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Samuel Humeau Kurt Shuster Marie-Anne Lachaux Jason Weston 24 279 0 22 Apr 2019
Image Chat: Engaging Grounded Conversations Kurt Shuster Samuel Humeau Antoine Bordes Jason Weston 23 115 0 02 Nov 2018
Deep Reinforcement Learning for Dialogue Generation Jiwei Li Will Monroe Alan Ritter Michel Galley Jianfeng Gao Dan Jurafsky 214 1,327 0 05 Jun 2016