ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.10635
  4. Cited By
Better Automatic Evaluation of Open-Domain Dialogue Systems with
  Contextualized Embeddings

Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings

24 April 2019
Sarik Ghazarian
Johnny Tian-Zheng Wei
Aram Galstyan
Nanyun Peng
ArXivPDFHTML

Papers citing "Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings"

28 / 28 papers shown
Title
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
Suvodip Dey
M. Desarkar
OffRL
46
0
0
20 Jan 2025
Context Does Matter: Implications for Crowdsourced Evaluation Labels in
  Task-Oriented Dialogue Systems
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
43
3
0
15 Apr 2024
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
74
1,090
0
29 Mar 2023
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model
Cat P. Le
Luke Dai
Michael Johnston
Yang Liu
M. Walker
R. Ghanadan
ELM
19
10
0
31 Jan 2023
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
Chen Zhang
L. F. D’Haro
Qiquan Zhang
Thomas Friedrichs
Haizhou Li
26
7
0
18 Dec 2022
EnDex: Evaluation of Dialogue Engagingness at Scale
EnDex: Evaluation of Dialogue Engagingness at Scale
Guangxuan Xu
Ruibo Liu
Fabrice Harel-Canada
Nischal Reddy Chandra
Nanyun Peng
21
5
0
22 Oct 2022
StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning
StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning
Hong Chen
D. Vo
Hiroya Takamura
Yusuke Miyao
Hideki Nakayama
30
20
0
16 Oct 2022
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
Longxuan Ma
Ziyu Zhuang
Weinan Zhang
Mingda Li
Ting Liu
29
4
0
17 Aug 2022
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for
  Evaluating Open-Domain Dialogue
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue
Pengfei Zhang
Xiao-fei Hu
Kaidong Yu
Jian Wang
Song-Bo Han
Cao Liu
C. Yuan
27
7
0
19 Jun 2022
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating
  Controlled Text Generation
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation
Pei Ke
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
Xiaoyan Zhu
Minlie Huang
21
38
0
02 Apr 2022
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog
  Evaluation
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
Sarik Ghazarian
Behnam Hedayatnia
Alexandros Papangelis
Yang Liu
Dilek Z. Hakkani-Tür
30
19
0
25 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
34
21
0
18 Mar 2022
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic
  Manipulations
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations
Sarik Ghazarian
Nuan Wen
Aram Galstyan
Nanyun Peng
27
40
0
18 Mar 2022
Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Huihan Li
Tianyu Gao
Manan Goenka
Danqi Chen
24
21
0
16 Dec 2021
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue
  Evaluation
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation
Chen Zhang
L. F. D’Haro
Thomas Friedrichs
Haizhou Li
ELM
25
18
0
14 Dec 2021
Identifying Untrustworthy Samples: Data Filtering for Open-domain
  Dialogues with Bayesian Optimization
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization
Lei Shen
Haolan Zhan
Xin Shen
Hongshen Chen
Xiaofang Zhao
Xiao-Dan Zhu
43
17
0
14 Sep 2021
Synthesizing Adversarial Negative Responses for Robust Response Ranking
  and Evaluation
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation
Prakhar Gupta
Yulia Tsvetkov
Jeffrey P. Bigham
42
22
0
10 Jun 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
A Comprehensive Assessment of Dialog Evaluation Metrics
Yi-Ting Yeh
M. Eskénazi
Shikib Mehri
36
105
0
07 Jun 2021
DynaEval: Unifying Turn and Dialogue Level Evaluation
DynaEval: Unifying Turn and Dialogue Level Evaluation
Chen Zhang
Yiming Chen
L. F. D’Haro
Yan Zhang
Thomas Friedrichs
Grandee Lee
Haizhou Li
24
73
0
02 Jun 2021
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
Jian Guan
Zhexin Zhang
Zhuoer Feng
Zitao Liu
Wenbiao Ding
Xiaoxi Mao
Changjie Fan
Minlie Huang
20
60
0
19 May 2021
Meta-evaluation of Conversational Search Evaluation Metrics
Meta-evaluation of Conversational Search Evaluation Metrics
Zeyang Liu
K. Zhou
Max L. Wilson
ELM
32
17
0
27 Apr 2021
Deconstruct to Reconstruct a Configurable Evaluation Metric for
  Open-Domain Dialogue Systems
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems
Vitou Phy
Yang Zhao
Akiko Aizawa
14
55
0
01 Nov 2020
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
Jian Guan
Minlie Huang
29
69
0
16 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
33
230
0
27 Aug 2020
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of
  Current Evaluation Protocols
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols
Sarah E. Finch
Jinho Choi
ELM
29
67
0
10 Jun 2020
Towards a Human-like Open-Domain Chatbot
Towards a Human-like Open-Domain Chatbot
Daniel De Freitas
Minh-Thang Luong
David R. So
Jamie Hall
Noah Fiedel
...
Zi Yang
Apoorv Kulshreshtha
Gaurav Nemade
Yifeng Lu
Quoc V. Le
42
924
0
27 Jan 2020
Adversarial Evaluation of Dialogue Models
Adversarial Evaluation of Dialogue Models
Anjuli Kannan
Oriol Vinyals
AAML
ALM
141
76
0
27 Jan 2017
OpenNMT: Open-Source Toolkit for Neural Machine Translation
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
273
1,896
0
10 Jan 2017
1